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It  is  generally  easier  to  disambiguate  people  with  uncommon  names  than  people  with  common 
names;  in  the  extreme  case  a  name  can  be  so  uncommon  that  it  is  used  by  only  a  single  person 
on  the  planet,  and  no  disambiguation  is  necessary.  This  thesis  explores  the  use  of  uncommon 
names  to  correlate  identity  records  stored  in  DoD411  with  user  profile  pages  stored  on  three 
popular  social  network  sites:  Linkedln,  Facebook,  and  MySpace.  After  grounding  the  approach 
in  theory,  a  working  correlation  system  is  presented.  We  then  statistically  sample  the  results 
of  the  correlation  to  infer  statistics  about  the  use  of  social  network  sites  by  DoD  personnel. 
Among  the  results  that  we  present  are  the  percentage  of  DoD  personnel  that  have  Facebook 
pages;  the  ready  availability  of  information  about  DoD  families  from  information  that  DoD 
personnel  have  voluntarily  released  on  social  network  sites;  and  the  availability  of  information 
related  to  specific  military  operations  and  unit  deployments  provided  by  DoD  members  and 
their  associates  on  social  network  sites.  We  conclude  with  a  brief  analysis  of  the  privacy  and 
policy  implications  of  this  work. 
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CHAPTER  1 : 
Introduction 


1.1  Social  Networks  and  the  Department  of  Defense 

The  use  of  social  network  sites  within  the  DoD  is  becoming  more  widespread  and  is  not  limited 
to  personnel,  but  is  becoming  increasingly  common  within  organizations.  There  is  also  growing 
concern  regarding  the  use  of  such  sites.  Several  organizations  within  the  DoD,  most  notably  the 
Marine  Corps,  previously  banned  the  use  of  such  sites  on  DoD  computers  and  networks,  but 
those  bans  were  rescinded  in  early  2010  after  a  DoD  Memorandum  specifically  permitted  the 
use  of  such  sites  on  the  NIPRNET  [1], 

This  thesis  explores  how  official  DoD  information  can  be  correlated  with  data  from  social  net¬ 
work  sites,  showing  that  there  may  be  risks  in  social  network  use  that  are  not  obvious  to  today’s 
warfighters. 


1.2  Background 

In  their  article  Social  Network  Sites:  Definition,  History,  and  Scholarship,  social  media  re¬ 
searchers  boyd  [sic]  and  Ellison  define  a  social  network  site  as  a  web-based  service  that  allows 
individual  users  to  do  three  things:  (1)  They  must  be  able  to  “construct  a  public  or  semi-public 
profile  within  a  bounded  system,”  (2)  they  must  be  able  to  view  a  list  of  other  users  with  whom 
they  share  a  connection,  and  (3),  they  must  be  able  to  “view  and  traverse  their  list  of  connections 
and  those  made  by  others  within  the  system.”  The  authors  further  assert  that  the  idea  that  makes 
social  network  sites  powerful  is  not  that  they  give  users  the  ability  to  meet  strangers,  but  rather 
that  they  enable  users  to  articulate  and  make  visible  their  social  networks  [2]. 

Most  of  today’s  social  network  sites  provide  the  first  criteria  by  allowing  users  to  create  a  profile 
of  themselves,  typically  including  the  user’s  name,  photo,  email  address,  birth  date,  interests, 
and  other  personal  information.  Some  sites  allow  profiles  to  be  visible  to  everyone,  even  viewers 
without  an  account.  Other  sites  let  users  allow  users  to  choose  the  visibility  of  their  profile  for 
different  groups  of  viewers  such  as  with  Facebook’s  “Friends”  group,  “Friends  of  Friends” 
group,  and  “Everyone”  group. 
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The  second  criteria  is  typically  met  when  users  are  asked  to  identify  others  in  the  system  with 
whom  they  would  like  to  have  a  connection.  On  many  sites,  a  connection  between  two  users  is 
only  established  after  both  users  confirm  the  connection.  Different  sites  use  different  terms  to 
identify  these  connections.  Linkedln  uses  the  term  “Connection,”  while  MySpace  and  Facebook 
use  the  term  “Friend.” 

The  third  criteria  is  met  on  most  sites  by  publicly  displaying  a  person’s  list  of  connections  or 
“Friends”  on  their  profile  page.  This  allows  viewers  to  traverse  the  network  graph  by  clicking 
through  the  list  of  “Friends.” 

1.2.1  History  of  Social  Network  Sites 

For  more  than  three  decades  computer  networks  have  played  host  to  an  array  of  services  de¬ 
signed  to  facilitate  communication  among  groups  of  people.  One  of  the  earliest  precursors 
to  modern  social  network  sites  were  electronic  Bulletin  Board  Systems  (BBSs)  [3].  The  first 
BBS,  called  Computerized  Bulletin  Board  System,  debuted  in  1978  and  was  soon  followed  by 
other,  similar  systems  [4].  These  BBS  systems,  which  remained  popular  through  the  1990s,  let 
groups  form  around  specific  topics  of  interest  by  allowing  users  to  post  and  read  messages  from 
a  central  location. 

After  the  commercial  Internet  service  providers  (ISPs)  brought  the  Internet  to  more  “average” 
users,  Web  sites  devoted  to  online  social  interaction  began  to  appear.  AOL  provided  its  cus¬ 
tomers  with  member-created  communities  including  searchable  member  profiles  in  which  users 
could  include  personal  details  [3].  GeoCities  and  TheGlobe,  created  in  1994,  let  users  create 
their  own  HTML  member  pages,  provided  chat  rooms,  galleries,  and  message  boards  [4].  In 
1995  Classmates.com  launched;  this  service  didn’t  allow  users  to  create  their  own  profiles,  but 
did  allow  members  to  search  for  their  school  friends  [4].  AOL’s  1997  release  of  AOL  Instant 
Messenger  helped  bring  instant  messaging  to  the  mainstream,  one  more  step  on  the  way  to 
today’s  social  network  sites  [4]. 

Another  1997  release,  SixDegrees.com,  was  the  first  site  to  combine  all  of  the  features  defined 
by  boyd  and  Ellison  as  “essential  ”  to  a  social  network  site.  SixDegrees  allowed  users  to  create 
personal  profiles,  form  connections  with  friends,  and  browse  other  users’  profiles  [3].  Ryze.com 
opened  in  2001  as  a  social  network  site  with  the  goal  of  helping  people  leverage  business  net¬ 
works.  It  was  soon  followed  by  Friendster  in  2002,  which  was  intended  as  a  social  complement 
to  Ryze  [2].  Although  Friendster  did  not  become  immensely  popular  in  the  U.S.,  it  is  still  a 
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leading  social  network  site  globally,  boasting  more  than  115  million  members  worldwide  and 
is  a  top  25  global  Web  site  serving  over  9  billion  pages  per  month  [5]. 

A  new  social  network  site,  MySpace,  officially  launched  in  January  2004  and  hit  1  million 
members  by  February  of  that  year.  By  July  2005,  MySpace  boasted  20  million  unique  users  and 
was  acquired  by  News  Corporation  [6].  As  of  January  2010,  MySpace  has  70  million  unique 
users  in  the  U.S.  and  more  than  100  million  monthly  active  users  globally  [7]. 

In  2003  Linkedln  brought  a  more  serious  approach  to  social  network  sites  with  its  goal  of 
appealing  to  businesspeople  wanting  to  connect  with  other  professionals  [3].  Linkedln  has  re¬ 
mained  popular  among  professionals  and  as  of  early  2010  has  over  60  million  members  world¬ 
wide,  including  executives  from  all  Fortune  500  companies  [8]. 

Facebook,  founded  by  Mark  Zuckerberg  in  February  2004,  began  as  an  exclusive  site  allowing 
only  participants  with  a  Harvard.edu  email  address.  One  month  later  it  expanded  to  allow 
participants  from  Stanford,  Columbia,  and  Yale.  More  universities  were  added  throughout  2004 
and  in  September  2005  high  school  networks  were  allowed.  Facebook  opened  to  the  general 
public  in  September  2006  [9].  The  site  has  continued  to  expand  and  became  the  leading  social 
network  site  in  the  U.S.  after  surpassing  MySpace  in  December  2008  [10](See  Figure  1.1).  In 
March  2010,  Facebook.com  surpassed  Google.com  in  weekly  Internet  visits  originating  in  the 
U.S.,  making  it  the  most  visited  site  in  the  U.S.  for  that  week  [1 1]  (See  Figure  1.2).  The  number 
of  Facebook  members  doubled  during  2009  from  200  million  to  400  million  [12]. 

A  visual  comparison  of  the  growth  in  popularity  of  a  few  selected  sites  is  shown  in  Figure  1.3, 
which  shows  each  site’s  daily  traffic  rank  over  the  past  two  years.  A  separate  visual  comparison 
of  each  site’s  popularity  is  shown  in  Figure  1.4,  which  we  generated  using  Google  Insights  for 
Search 1 ,  a  tool  that  compares  the  popularity  of  search  terms  over  time.  We  compared  the  search 
terms  “Facebook,”  “MySpace,”  “Linkedln,”  and  “Twitter”  as  an  estimate  of  the  popularity  of 
those  sites.  We  limited  the  comparison  to  search  statistics  from  the  U.S.  only.  Note  that  this 
chart  shows  Facebook  surpassing  MySpace  in  popularity  at  approximately  the  same  time  as  the 
charts  in  Figure  1.1  and  Figure  1.3.  See  Table  1.1  for  a  summary  of  several  popular  sites. 

1.2.2  Facebook  Applications 

Facebook  Platform  is  a  set  of  APIs  and  tools  that  enable  applications  to  interact  with  the  Face- 
book  social  graph  and  other  Facebook  features.  Developers  can  create  applications  that  integrate 

'http : / / www . google . com/ insights/ search/# 
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Figure  1.1:  Facebook  surpasses  MySpace  in  U.S.  unique  visits.  Graphic  from  [10]. 


Site 

Launch  Date 

Current  Membership 

Linkedln 

May  2003 

60  million 

MySpace 

Jan  2004 

100  million 

Facebook 

Feb  2004 

400  million 

Table  1.1:  Summary  statistics  on  various  social  network  sites.  Current  membership  numbers 
are  from  March  2010. 


with  users’  Facebook  pages.  Examples  of  popular  Facebook  applications  include: 


•  Photos  -  Allows  users  to  upload  and  share  an  unlimited  number  of  photos. 

•  Movies  -  Users  can  rate  movies  and  share  movies  that  they  have  seen  or  want  to  see  with 
their  friends. 

•  Farmville  -  A  farm  simulation  game  that  allows  users  to  manage  a  virtual  farm.  Players 
can  purchase  virtual  goods  or  currency  to  help  them  advance  in  the  game. 

•  Daily  Horoscope  -  Users  get  a  personalized  daily  horoscope. 

•  IQ  Test  -  A  short  quiz  that  lets  users  test  their  IQ. 

•  Social  Interview  -  A  quiz  that  asks  users  to  answer  questions  about  their  friends. 
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Figure  1.2:  Facebook  surpasses  Google  in  the  U.S.  for  the  week  ending  March  13,  2010. 
Graphic  from  [11]. 


Facebook  applications  range  from  useful  utilities,  like  the  Photos  application,  to  intrusive  sur¬ 
veys  that  ask  users  to  answer  personal  questions  about  their  friends.  All  of  these  applications 
are  able  to  access  users’  profile  information  and  the  profile  information  of  their  Friends  with  the 
same  level  of  priviledge  as  the  user  of  the  application.  This  means  that  even  users  who  have  not 
authorized  or  used  a  particular  application  can  have  their  personal  information  exposed  to  any 
application  used  by  one  of  their  Friends  [13]. 

It  is  important  to  note  that  most  of  these  applications  are  developed  and  controlled  by  third- 
parties.  Most  users  don’t  realize  that  even  if  they  set  their  Facebook  privacy  settings  in  such 
a  way  that  only  Friends  can  view  their  personal  information,  any  application  that  their  Friends 
authorize  can  also  view  their  Friend-only  information. 

At  the  Facebook  F8  conference  on  April  21,  2010,  several  new  changes  to  the  Facebook  Plat¬ 
form  were  announced.  Facebook  CEO  Mark  Zuckerberg  said  that  Facebook  is  getting  rid  of 
the  policy  preventing  developers  from  caching  or  storing  users’  personal  data  for  more  than  24 
hours.  Brett  Taylor,  Facebook’s  Head  of  Platform  Products,  announced  that  developers  will  now 
have  the  ability  to  search  over  all  the  public  updates  on  Facebook  and  that  Facebook  is  adding 
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Traffic  Rank  Reach  Pageviews  Pageviews/User  Bounce  %  Time  on  Site  Search  % 
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Alexa  traffic  rank  for  facebook.com: 
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Figure  1.3:  Comparison  of  daily  traffic  rank  from  March  2008  to  March  2010  for 
Facebook,  MySpace,  Linkedln,  Friendster,  and  Twitter  using  Alexa.com  traffic  statis¬ 
tics  (http : / / www .alexa . com/ siteinfo/ facebook . com+my space . com+ 

linkedin  .  com+friendster  .  com+twitter  .  com#traf  f icstat s). 


callbacks  that  will  notify  developers  whenever  a  user  of  their  application  updates  their  profile, 
adds  a  new  connection,  or  posts  a  new  wall  post  [14].  These  new  changes  will  give  developers 
even  more  access  to  users’  private  data  and  releases  most  of  the  restrictions  on  what  they  can 
do  with  that  data. 

On  May  26,  2010,  Zuckerberg  made  an  announcement  of  more  changes  to  the  Facebook  privacy 
policy  and  settings.  The  new  changes  will  allow  users  to  turn  off  Facebook  Platform,  which  will 
prevent  any  applications  from  accessing  their  personal  data  [15]. 

Companies  that  develop  Facebook  applications  stand  to  profit  from  access  to  users’  private  data. 
These  applications  can  generate  a  revenue  stream  through  various  business  models  including 
advertising,  subscriptions,  virtual  money,  and  affiliate  fees.  As  applications  are  able  to  access 
user  data  more  freely,  they  can  more  effectively  target  users  for  advertising  purposes. 

An  important  point  is  that  there  are  no  technical  restrictions  that  limit  what  developers  or  appli¬ 
cations  can  do  with  the  information  they  collect  on  users. 
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Figure  1 .4:  Comparison  of  relative  number  of  searches  done  on  Google  for  Facebook,  Myspace, 
Linkedln,  and  Twitter  from  January  2004  to  March  2010.  Numbers  are  normalized  to  fit  a 
scale  of  0-100.  See  http  :  //www .  google  .  com/ insights/ search/#q=f  acebook% 
2Cmy space%2Cl inkedin%2Ctwitter&geo=US&cmpt=q. 


1.2.3  DoD411 

The  Department  of  Defense  Global  Directory  Service  (GDS),  also  known  as  DoD411,  is  an 
enterprise-wide  directory  service  that  provides  the  ability  to  search  for  basic  information  (name, 
email  address,  and  public  key  email  certificate)  about  DoD  personnel  who  have  a  DoD  Public 
Key  Infrastructure  (PKI)  certificate  on  the  Unclassified  but  Sensitive  Internet  Protocol  Router 
Network  (NIPRNET)  and  the  Secret  Internet  Protocol  Router  Network  (SIPRNET)  [16].  The 
DoD411  service  can  be  accessed  with  a  valid  DoD  PKI  certificate  using  a  web  browser  at 
https  :  / /dod4 1 1 .  gds  .disa.mil.  The  service  can  also  be  accessed  with  a  Lightweight 
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Directory  Access  Protocol  (LDAP)  client  without  using  a  valid  DoD  PKI  certificate  at  ldap  : 
//dod4 1 1 .  gds  .  disa  .mil.  DoD411  stores  the  full  name,  email  address,  organization 
(USAF,  USCG,  etc.),  employee  number,  and  public  key  email  certificate  of  all  DoD  PKI  users, 
including  both  active  duty  and  reserve  members,  civilian  employees,  and  contractors.  LDAP 
access  to  the  directory  is  allowed  so  that  email  clients  can  access  the  public  key  certificates  of 
email  recipients  in  order  to  encrypt  an  email  message  [17]. 

1.3  Motivation 

1.3.1  True  Names  and  Privacy  Settings 

Users  of  social  networking  sites  typically  fill  out  their  profile  information  using  their  real  names, 
email  addresses,  and  other  personal  information.  Users  of  these  sites  even  provide  personal  de¬ 
tails  including  educational  background,  professional  background,  interests  and  hobbies,  activi¬ 
ties  they  are  currently  involved  in,  and  the  status  of  their  current  relationship  [18].  According  to 
Facebook’s  developer  site,  97%  of  user  profiles  include  the  user’s  full  name,  85%  include  a  pic¬ 
ture,  and  58%  include  the  user’s  education  history  [13].  The  Facebook  Terms  of  Service  Agree¬ 
ment  prohibits  users  from  providing  false  personal  information  or  registering  an  account  for 
any  person  other  than  oneself  [19].  There  is  even  legal  precedent  for  using  Facebook  accounts 
as  a  valid  means  of  contact  with  a  person  in  legal  matters.  In  December  2008,  an  Australian 
Supreme  Court  judge  ruled  that  court  notices  could  be  served  using  Facebook  [20]. 

Even  though  users  of  social  network  sites  provide  intimate  personal  details  on  the  sites,  most 
users  expect  some  level  of  privacy  and  protection  of  their  personal  information.  Facebook  offers 
privacy  settings  that  allow  users  to  control  who  can  view  their  profile  and  “status  updates”  or 
posts.  However,  according  to  the  Facebook  Privacy  Policy: 

Certain  categories  of  information  such  as  your  name,  profile  photo,  list  of  friends 
and  pages  you  are  a  fan  of,  gender,  geographic  region,  and  networks  you  belong 
to  are  considered  publicly  available  to  everyone,  including  Facebook-enhanced  ap¬ 
plications,  and  therefore  do  not  have  privacy  settings.  You  can,  however,  limit  the 
ability  of  others  to  find  this  information  through  search  using  your  search  privacy 
settings  [21]. 

Although  users  can  prevent  their  profile  from  appearing  in  search  results,  they  cannot  prevent 
profile  information  from  being  viewed  by  someone  who  knows  the  URL  to  their  profile  page. 
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This  becomes  important  when  someone  accesses  a  profile  page  by  clicking  on  a  link  to  it,  such 
as  from  the  list  of  Friends  displayed  on  another  user’s  profile  page. 

The  privacy  settings  and  policies  of  specific  social  network  sites  frequently  change.  Until  re¬ 
cently,  Facebook’s  privacy  controls  were  limited  to  selecting  from  “Friends  Only,”  “Friends- 
of-Friends,”  and  ’’Everyone.”  Beginning  in  January  2010,  the  privacy  controls  were  updated 
to  allow  more  fine-grained  control  over  who  could  view  a  user’s  profile  and  postings,  even 
allowing  one  to  select  down  to  the  user-level  [22]  (See  Figure  1.5).  Other  changes  made  in 
January  2010  included  a  simplified  privacy  settings  page  and  the  removal  of  regional  networks 
[22].  Although  Facebook  now  offers  finer-grained  privacy  controls,  not  all  users  know  about 
or  make  use  of  them.  During  the  December  2009/January  2010  privacy  controls  update,  users 
were  prompted  by  a  “transition  tool”  with  a  choice  to  keep  their  previous  privacy  settings  or  to 
change  to  settings  recommended  by  Facebook.  One  of  these  new  default  settings  was  to  allow 
“Everyone”  to  see  status  updates.  The  default  setting  for  viewing  certain  profile  information 
was  also  set  to  “Everyone.”  And  the  setting  controlling  whether  a  Facebook  user’s  information 
could  be  indexed  by  search  engines  was  set  to  “Allow”  by  default  [23]  [24].  Facebook  said  35% 
of  users  had  read  the  new  privacy  documentation  and  changed  something  in  the  privacy  set¬ 
tings,  but  this  means  that  65%  of  users  made  their  content  public  by  not  changing  their  privacy 
settings  [25]. 

Another  recent  Facebook  change  required  users  to  choose  to  “opt  out”  of  sharing  personal 
information  with  third-parties,  rather  than  the  traditional  “opt  in”  settings  for  sharing  private 
information.  This  move  prompted  a  petition  to  the  Federal  Trade  Commission  to  investigate 
the  privacy  policies  of  social  network  sites  for  things  that  might  deliberately  mislead  or  confuse 
users.  Facebook  and  other  social  network  sites  have  a  clear  financial  incentive  in  allowing  the 
personal  information  of  its  users  to  be  shared  with  advertisers,  who  can  more  effectively  target 
groups  and  individuals  [26]. 

1.3.2  Threat  to  DoD 

DoD  employees,  warfighters,  and  other  DoD  personnel  are  increasingly  participating  in  social 
network  sites.  Organizations  within  the  DoD  are  beginning  to  use  social  network  sites  for 
distributing  information  and  recruiting.  The  DoD  recently  rescinded  a  ban  on  the  use  of  social 
network  sites  on  DoD  networks  [1]  and  the  DoD  maintains  several  Web  sites  devoted  to  social 
media,  including  http :  /  / www .  defense  .  gov/,  http  :  /  / socialmedia  .  defense  . 
gov/,  and  http  :  /  /www .  ntm-a  .  com/.  A  complete  list  of  the  DoD’s  official  social  media 
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Figure  1.5:  Facebook  allows  users  to  specify  who  can  or  cannot  view  their  profile  information. 

pages  is  at  http  :  /  /  www  .defense  .  gov/RegisteredSites/SocialMediaSites  . 

aspx.  As  of  this  writing,  the  U.S.  Navy’s  official  social  media  sites  included  13  blogs,  193 
Facebook  pages,  28  Flickr  sites,  1 15  Twitter  feeds,  and  20  Youtube  channels. 

With  the  increased  use  of  social  media  and  social  network  sites  across  the  DoD,  there  is  an 
increased  threat.  Possible  threats  to  the  DoD  include  leaking  of  sensitive  information,  exposure 
to  malware  introduced  into  DoD  networks  through  social  media  sites,  and  a  threat  to  DoD 
personnel  and  family  members. 

These  threats  are  not  hypothetical.  Israeli  Defense  Forces  called  off  an  operation  after  a  soldier 
posted  details  of  a  planned  raid  on  his  Facebook  page.  The  soldier  posted  the  location  and  time 
of  the  planned  operation  and  the  name  of  his  unit.  He  was  reported  to  military  authorities  by  his 
Facebook  friends  [27]. 
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One  post  on  a  jihadist  Web  site  instructed  people  to  gather  intelligence  about  U.S.  military  units 
and  family  members  of  U.S.  service  members: 

...now,  with  Allah’s  help,  all  the  American  vessels  in  the  seas  and  oceans,  including 
aircraft  carriers,  submarines,  and  all  naval  military  equipment  deployed  here  and 
there  that  is  within  range  of  Al-Qaeda’s  fire,  will  be  destroyed... 

To  this  end,  information  on  every  U.S.  naval  unit  and  only  U.S.  [units]!!  should 
be  quietly  gathered  [as  follows:]  [the  vessel’s]  name,  the  missions  it  is  assigned; 
its  current  location,  including  notation  of  the  spot  in  accordance  with  international 
maritime  standards;  the  advantages  of  this  naval  unit;  the  number  of  U.S.  troops  on 
board,  including  if  possible  their  ranks,  and  what  state  they  are  from,  their  family 
situation,  and  where  their  family  members  (wife  and  children )  live; 

...monitor  every  website  used  by  the  personnel  on  these  ships ,  and  attempt  to  dis¬ 
cover  what  is  in  these  contacts;  identify  the  closest  place  on  land  to  these  ships  in 
all  directions...;  searching  all  naval  websites  in  order  to  gather  as  much  information 
as  possible,  and  translating  it  into  Arabic;  search  for  the  easiest  ways  of  striking 
these  ships... 

My  Muslim  brothers,  do  not  underestimate  the  importance  of  any  piece  of  informa¬ 
tion,  as  simple  as  it  may  seem;  the  mujahideen,  the  lions  of  monotheism,  may  be 
able  to  use  it  in  ways  that  have  not  occurred  to  you.  [28]  (Emphasis  added) 

The  U.S.  Army’s  2010  “Mad  Scientist”  Future  Technology  Seminar,  an  annual  conference  look¬ 
ing  at  new  developments  in  military  science  and  hardware,  found  the  need  to  mention  the  threat 
of  social  networking  to  family  members: 

Increasing  dependence  on  social  networking  systems  blended  with  significant  im¬ 
provements  in  immersive  3-D  technologies  will  change  the  definition  of  force  pro¬ 
tection  and  redefine  the  meaning  of  area  of  operations.  Social  networking  could 
make  the  family  and  friends  of  Soldiers  real  targets,  subsequently  requiring  in¬ 
creased  protection.  Additionally,  the  mashing  of  these  technologies  could  poten¬ 
tially  hurt  recruitment  and  retention  efforts.  Some  of  our  more  advanced  poten¬ 
tial  adversaries,  including  China,  have  begun  work  in  the  social  networking  arena. 
However,  future  blending  of  social  networks  and  Immersive  3-D  technology  makes 
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it  increasingly  likely  that  engagements  will  take  place  outside  physical  space  and 
will  expand  the  realms  in  which  Soldiers  are  required  to  conduct  operations. [29] 
(Emphasis  added) 


Master  Chief  Petty  Officer  of  the  Navy  (MCPON)  (SS/SW)  Rick  D.  West  also  mentioned  the 
possible  threat  to  family  members: 


Anyone  who  thinks  our  enemies  don’t  monitor  what  our  Sailors,  families  and  com¬ 
mands  are  doing  via  the  Internet  and  social  media  had  better  open  their  eyes.  These 
sites  are  great  for  networking,  getting  the  word  out  and  talking  about  some  of  our 
most  important  family  readiness  issues,  but  our  Sailors  and  their  loved  ones  have  to 
be  careful  with  what  they  say  and  what  they  reveal  about  themselves,  their  families 
or  their  commands.... 

Our  enemies  are  advanced  and  as  technologically  savvy  as  they’ve  ever  been.  They’re 
looking  for  personal  information  about  our  Sailors,  our  families  and  our  day-to-day 
activities  as  well  as  ways  to  turn  that  information  into  maritime  threats.  [30] 


As  the  use  of  social  network  sites  continues  to  increase  throughout  the  DoD  and  among  DoD 
personnel,  these  threats  will  only  continue  to  grow.  This  threat  is  real,  not  only  to  DoD  person¬ 
nel,  but  also  to  their  family  members  and  friends. 


1.4  Thesis  Goals 

The  primary  objective  of  this  thesis  is  to  determine  the  extent  to  which  DoD  personnel  use 
social  network  sites.  A  secondary  objective  is  to  elevate  awareness  of  the  growing  threat  and 
risks  associated  with  the  use  of  social  network  sites  across  the  DoD  and  among  DoD  personnel. 
We  will  accomplish  these  goals  by  answering  the  following  research  questions: 


•  What  percentage  of  DoD  personnel  currently  hold  accounts  on  Facebook,  MySpace,  and 
Linkedln? 

•  What  percentage  of  DoD  personnel  do  not  hold  accounts  on  Facebook,  MySpace,  and 
Finkedln? 
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In  order  to  answer  these  research  questions  we  will  propose  a  method  for  finding  the  social 
network  profiles  of  DoD  personnel.  We  will  then  use  this  method  to  correlate  identity  records 
stored  on  DoD411  with  Facebook,  MySpace,  and  Linkedln.  Along  with  the  results  of  our 
experiments,  we  will  demonstrate  the  threat  to  the  DoD  by  showing  the  ease  with  which  the 
social  network  profiles  of  DoD  personnel  and  their  family  members  can  be  found.  We  will  also 
provide  examples  of  information  posted  on  social  network  sites  by  DoD  personnel  and  their 
associates  that  identifies  specific  military  units  and  deployment  plans. 

1.5  Thesis  Organization 

The  remaining  chapters  of  this  thesis  will  be  organized  as  follows: 

1.5.1  Chapter  2  Related  Work 

This  chapter  will  give  an  overview  of  the  leading  research  that  has  been  done  in  the  area  of 
online  social  networks.  It  will  cover  several  different  aspects  of  this  research  including  mining 
social  network  sites  for  data,  attacks  using  social  network  sites,  and  privacy  issues  involving 
social  network  sites.  A  brief  overview  of  related  work  in  the  area  of  unusual  names  will  also  be 
given. 

1.5.2  Chapter  3  Approach  and  Contributions 

This  chapter  will  state  the  research  questions  that  this  thesis  will  attempt  to  address.  The  chapter 
will  also  summarize  the  contributions  of  this  thesis  and  the  approach  that  we  followed. 

1.5.3  Chapter  4  Experiments 

The  purpose  of  this  chapter  is  to  provide  a  detailed  accounting  of  the  experiments  conducted 
in  pursuit  of  answers  to  the  primary  research  questions  of  this  thesis.  The  chapter  will  also 
provide  the  results  of  the  experiments,  limitations  that  were  encountered,  and  the  lessons  that 
were  learned  while  conducting  the  experiments. 

1.5.4  Chapter  5  Other  Discoveries  and  Future  Work 

This  chapter  will  present  other  discoveries  that  we  made  through  the  course  of  conducting  our 
experiments.  These  discoveries  do  not  directly  relate  to  the  results  of  the  experiments,  but  are 
important  to  discuss  in  the  context  of  future  research  efforts.  This  chapter  will  also  discuss 
proposed  areas  for  future  research  that  will  extend  the  work  done  in  this  thesis.  These  areas 
include  research  in  the  areas  of  uncommon  names,  compiling  an  online  profile  of  an  individual, 
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active  attacks  using  social  networks,  and  research  into  new  policies  and  education  efforts  related 
to  social  networks. 

1.5.5  Chapter  6  Conclusion 

This  chapter  will  briefly  summarize  the  actual  contributions  of  this  thesis  and  the  conclusions 
that  can  be  made  from  the  results  of  this  research.  It  will  also  discuss  recommendations  for 
actions  that  should  be  taken  to  address  the  concerns  highlighted  by  this  research. 
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CHAPTER  2: 
Related  Work 


2.1  Extracting  Information  from  Social  Network  Sites 

Gross  and  Acquisti  downloaded  4,540  Facebook  profiles  belonging  to  Carnegie  Mellon  Uni¬ 
versity  (CMU)  students  in  order  to  gain  an  understanding  of  the  privacy  practices  of  Facebook 
users  [31].  At  the  time  of  the  study  (June  2005),  Facebook  was  a  college-oriented  social  net¬ 
working  site  with  separate  networks  for  each  school.  A  valid  CMU  email  address  was  required 
for  registration  and  login  to  the  CMU  Facebook  site.  The  study  found  that  62%  of  undergradu¬ 
ate  students  at  CMU  had  a  Facebook  account.  The  study  also  found  that  CMU  students  shared 
a  surprising  amount  of  personal  information:  90.8%  of  the  profiles  included  an  image,  87.8% 
displayed  the  owner’s  birth  date,  39.9%  listed  a  phone  number,  and  50.8%  revealed  the  user’s 
current  residence.  Most  users  also  revealed  other  personal  information  including  relationship 
status,  political  views,  and  personal  interests.  Gross  and  Acquisti  also  found  that  the  vast  major¬ 
ity  of  users’  Facebook  profile  names  were  the  real  first  and  last  name  of  the  profile  owner-89% 
of  the  profiles  tested  used  a  real  first  and  last  name  matching  the  CMU  email  address  used  to 
register  the  account.  Just  3%  of  the  profiles  displayed  only  a  first  name  and  the  remaining  8% 
were  obvious  fake  names. 

In  the  same  study,  Gross  and  Acquisti  were  able  to  determine  the  percentage  of  users  who 
changed  their  default  privacy  settings.  They  found  that  only  1 .2%  of  users  changed  the  default 
setting  of  allowing  their  profile  to  be  searchable  by  all  Facebook  users  to  the  more  restrictive 
setting  of  allowing  their  profile  to  be  searchable  only  by  other  CMU  users.  Only  3  of  the  4,540 
profiles  in  the  study  had  a  modified  visibility  setting  from  the  default  of  allowing  the  profile  to 
be  viewed  by  all  Facebook  users  to  a  more  limited  setting  of  allowing  only  CMU  users  access 
to  the  profile. 

Gross  and  Acquisti  concluded  that  due  to  both  the  ease  with  which  privacy  protections  on  social 
networking  sites  can  be  circumvented  (See  [18])  and  the  lack  of  control  users  have  over  who  is 
in  their  network  (“Friends  of  Friends”  and  so  forth),  the  personal  information  that  users  reveal 
on  social  network  sites  is  effectively  public  data. 

Bonneau  et  al.  claim  that  it  is  difficult  to  safely  reveal  limited  information  about  a  social  net¬ 
work  without  allowing  for  the  possibility  that  more  information  can  be  discovered  about  that 
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network  [32].  They  present  an  example  using  Facebook,  which  allows  non-Facebook  users  and 
search  engines  to  view  the  public  profiles  of  users.  These  public  profiles  include  a  user’s  name, 
photograph,  and  links  to  up  to  eight  of  the  user’s  “Friends.”  The  eight  “Friends”  appear  to 
be  randomly  selected  from  among  the  user’s  complete  “Friends”  list.  Bonneau  et  al.  wrote  a 
spidering  script  that  was  able  to  retrieve  250,000  public  profile  listings  per  day  from  Facebook 
using  only  a  single  desktop  computer.  At  the  time  of  their  study,  this  would  amount  to  the 
ability  to  retrieve  the  complete  set  of  Facebook  public  listings  with  800  machine-days  of  effort. 
They  then  showed  that,  using  the  limited  information  available  through  public  profile  listings,  it 
was  possible  to  approximate  with  a  high  degree  of  accuracy  the  common  graph  metrics  of  ver¬ 
tex  degree,  dominating  sets,  betweenness  centrality,  shortest  paths,  and  community  detection. 
Among  the  privacy  concerns  introduced  by  this  research  is  the  increased  possibility  for  social 
phishing  attacks  using  emails  that  appear  to  come  from  a  friend  of  the  victim  (see  [33]  for  an 
example)  and  the  surprising  amount  of  information  that  can  be  inferred  solely  from  a  user’s 
“Friend”  list,  especially  when  matched  against  another  source  (e.g.,  the  known  supporters  of  a 
political  party). 

Gjoka  et  al.  conducted  an  experiment  in  which  they  were  able  to  crawl  Facebook  profiles  and 
obtain  data  on  300,000  users  [34].  They  accomplished  this  by  creating  20  Facebook  user  ac¬ 
counts  and  from  each  account  exploiting  a  feature  of  Facebook  that  allowing  them  to  repeatedly 
query  for  10  random  Facebook  users  within  the  same  geographic  network  as  the  fake  user  ac¬ 
count2. 


2.2  Attacks  on  Social  Network  Sites 

Jagatic  et  al.  showed  that  university  students  were  more  likely  to  divulge  personal  information 
in  response  to  spam  if  it  appeared  that  the  spam  came  from  someone  they  knew  [33].  They  set 
out  to  answer  the  question  “How  easily  and  effectively  can  an  attacker  exploit  data  found  on 
social  networking  sites  to  increase  the  yield  of  a  phishing  attack?”  They  found  several  sites  to 
be  rich  in  data  that  could  be  exploited  by  an  attacker  looking  for  information  about  a  victim’s 
friends.  Examples  of  such  sites  include  MySpace,  Facebook,  Orkut,  Linkedln,  and  LiveJoumal. 
In  order  to  answer  the  question,  the  authors  designed  and  conducted  a  phishing  experiment  in 
which  they  targeted  Indiana  University  students  using  data  obtained  by  crawling  such  social 
network  sites.  They  used  the  data  to  construct  a  “spear-phishing”  email  message  to  each  of 
the  targets;  these  attack  messages  appeared  to  come  from  one  of  the  target’s  friends.  These 

2At  the  time  of  this  experiment,  Facebook  still  supported  regional  networks  and  it  was  common  for  users  to 
belong  to  a  specific  geographic  network. 
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researchers  found  that  72%  of  the  targets  supplied  their  actual  university  logon  credentials  to  a 
server  located  outside  the  Indiana.edu  domain  in  response  to  the  phishing  message.  Only  16% 
of  the  control  group,  who  received  similar  emails  but  which  did  not  appear  to  come  from  a 
friend,  fell  for  the  scam.  The  study  also  showed  that  both  men  and  women  were  more  likely  to 
become  victims  if  the  spoofed  message  was  from  a  person  of  the  opposite  gender. 

Narayanan  and  Shmatikov  discussed  and  proposed  methods  for  re-identifying  nodes  in  an 
anonymized  social  network  graph  [35].  They  validated  their  algorithm  by  showing  that  a  third 
of  the  users  who  have  accounts  on  both  Flickr  and  Twitter  can  be  re-identified  with  only  a  12% 
error  rate.  Their  main  argument  is  that  social  graphs  can’t  be  truly  anonymized  because  it  is 
possible  to  identify  specific  entities  in  the  graph  if  one  has  access  to  the  anonymized  social 
graph  and  access  to  some  auxiliary  information  that  includes  relationships  between  nodes,  such 
as  another  social  network. 

In  a  separate  publication,  Narayanan  and  Shmatikov  presented  a  new  class  of  statistical  de¬ 
anonymization  attacks  which  show  that  removing  identifying  information  from  a  large  dataset 
is  not  sufficient  for  anonymity  [36].  They  used  their  methods  on  the  Netflix  Prize  dataset, 
which  contained  the  anonymous  movie  ratings  of  500,000  Netflix  subscribers.  By  correlating 
this  anonymous  database  with  the  Internet  Movie  Database,  in  which  known  users  post  movie 
ratings,  they  were  able  to  demonstrate  that  very  little  auxiliary  information  was  needed  to  re¬ 
identify  the  average  record  from  the  Netflix  Prize  dataset.  With  only  8  movie  ratings,  they  were 
able  to  uniquely  identify  99%  of  the  records  in  the  dataset. 

Bilge  et  al.  presented  two  automated  identity  theft  attacks  on  social  networks  [18].  The  first 
attack  was  to  clone  a  victim’s  existing  social  profile  and  send  friend  requests  to  the  contacts  of 
the  victim  with  the  hope  that  the  contacts  will  accept  the  friend  request,  enabling  the  attacker 
to  gain  access  to  sensitive  personal  information  of  the  victim’s  contacts.  The  second  attack  was 
to  find  the  profile  of  a  victim  on  a  social  networking  site  with  which  the  victim  is  registered 
and  clone  the  profile  on  a  site  with  which  the  victim  has  not  registered,  creating  a  forged  profile 
for  the  victim.  Using  the  forged  profile,  the  attacker  sends  friendship  requests  to  contacts  of 
the  victim  who  are  members  of  both  social  networks.  This  second  type  of  attack  is  even  more 
effective  than  the  first  because  the  victim’s  profile  is  not  duplicated  on  the  second  social  network 
site,  making  it  less  likely  to  raise  suspicion  with  the  victim’s  contacts.  Both  attacks  lead  to  the 
attacker  gaining  access  to  the  personal  information  of  the  contacts  of  the  victim. 

In  the  same  paper,  Bilge  et  al.  showed  that  is  possible  to  run  fully  automated  versions  of  both 
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attacks.  They  created  a  prototype  automated  attack  system  that  crawls  for  profiles  on  four  differ¬ 
ent  social  network  sites,  automatically  clones  and  creates  forged  profiles  of  victims,  and  sends 
invitations  to  the  contacts  of  the  victims.  In  addition,  the  system  is  able  to  analyze  and  break 
CAPTCHAs3  on  the  three  sites  that  used  CAPTCHAs  (SudiVZ,  MeinVZ,  and  Facebook)  with 
a  high  enough  success  rate  that  automated  attacks  are  practical.  On  the  Facebook  site,  which 
uses  the  reCAPTCHA  system,  they  were  able  to  solve  between  4-7  percent  of  the  CAPTCHAs 
encountered,  which  is  a  sufficient  rate  to  sustain  an  automated  attack  since  Facebook  does  not 
penalize  the  user  for  submitting  incorrect  CAPTCHA  solutions. 

As  part  of  implementing  the  second  form  of  attack,  the  authors  had  to  determine  whether  an 
individual  with  an  account  on  one  social  network  already  had  an  account  on  another  social 
network.  Since  there  may  be  multiple  users  with  the  same  name  on  a  given  social  network, 
names  alone  do  not  suffice  for  this  purpose.  The  authors  devised  a  scoring  system  in  which  they 
assigned  2  points  if  the  education  fields  matched,  2  points  if  the  employer  name  matched,  and 
1  point  if  the  city  and  country  of  the  user’s  residence  matched.  Any  instance  in  which  the  two 
profiles  being  compared  ended  up  with  3  or  more  points  was  counted  as  belonging  to  the  same 
user. 

Bilge  et  al.  then  conducted  experiments  with  these  attacks  and  showed  that  typical  users  tend 
toward  accepting  friend  requests  from  users  who  are  already  confirmed  as  contacts  in  their 
friend  list.  After  obtaining  the  permission  of  five  real  Facebook  users,  the  authors  cloned  the 
five  Facebook  profiles  and  demonstrated  an  acceptance  rate  of  over  60%  for  requests  sent  to  the 
contacts  of  the  five  original  accounts  from  the  cloned  accounts  [18]. 

A  study  conducted  in  2007  by  Sophos,  an  IT  security  company,  showed  that  41%  of  Face- 
book  users  accepted  a  “Friend”  request  from  a  fabricated  Facebook  profile  belonging  to  a  green 
plastic  frog,  in  the  process  revealing  personal  information  such  as  their  email  address,  full  birth 
date,  current  address,  and  details  about  their  current  workplace  [37].  In  2009,  Sophos  conducted 
another  study  that  involved  fabricating  Facebook  profiles  for  two  female  users  [38].  Each  pro¬ 
file  was  then  used  to  send  “Friend”  requests  to  randomly  selected  contacts.  46%  and  41% 
respectively  of  the  request  were  accepted,  with  most  of  the  accepting  users  revealing  personal 
information  including  email,  birth  date,  and  information  about  family  members  to  the  fabricated 
profiles. 


^Completely  Automated  Public  Turing  test  to  tell  Computers  and  Humans  Apart. 
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2.3  Social  Networking  and  Privacy 

Felt  and  Evans  addressed  the  problem  that  Facebook  and  other  popular  social  network  sites 
allow  third-party  applications  to  access  the  private  information  of  users  [39].  Users  of  the  sites 
have  little  or  no  control  over  the  information  that  is  shared  with  an  application.  The  Facebook 
API  allows  any  application  authorized  by  the  user  to  operate  with  the  privileges  of  the  user,  and 
thus  view  not  only  the  authorizing  user’s  personal  information,  but  also  view  the  profiles  of  the 
user’s  “Friends”  with  the  same  level  of  privilege  as  the  authorizing  user.  Felt  and  Evans  studied 
the  150  most  popular  Facebook  applications  and  found  that  over  90%  of  them  did  not  need  to 
access  the  users’  private  data  in  order  to  function,  showing  that  the  Facebook  API  was  granting 
developers  and  applications  more  access  than  needed  to  personal  user  data. 

In  a  related  paper,  Chew  et  al.  discuss  three  areas  of  discrepancy  between  what  social  network 
sites  allow  to  be  revealed  about  users  and  the  what  users  expect  to  be  revealed  [40].  Often,  users 
are  not  explicitly  aware  of  the  information  that  is  being  shared  with  unknown  third-parties. 
One  of  the  areas  identified  by  Chew  et  al.  where  users’  privacy  could  be  compromised  is 
the  merging  of  social  graphs  by  comparing  personally-identifiable  information  across  multiple 
social  network  sites  in  order  to  match  up  profiles  that  represent  the  same  individual.  This  is 
especially  problematic  in  situations  where  an  individual  uses  a  pseudonym  on  one  site  because 
they  wish  to  remain  anonymous  in  the  context  of  that  site,  but  their  identity  is  revealed  by 
correlating  information  that  can  identify  them  from  another  site. 


2.4  Research  on  Names 

Bekkerman  and  McCallum  presented  three  unsupervised  methods  for  distinguishing  between 
Web  pages  belonging  to  a  specific  individual  and  Web  pages  belonging  to  other  people  who 
happen  to  have  the  same  name  [41].  They  addressed  the  problem  of  determining  which  of  all 
the  Web  pages  returned  by  a  search  engine  for  a  search  on  a  specific  name  belong  to  the  person 
of  interest.  They  used  the  background  knowledge  of  the  names  of  contacts  in  the  person-of- 
interest’s  social  network  and  the  hypothesis  that  the  Web  pages  of  a  group  of  people  who  know 
each  other  are  more  likely  to  be  related.  The  method  works  by  searching  for  Web  pages  on  each 
name  in  the  social  network,  determining  which  pages  are  related  to  each  other,  and  clustering 
the  related  Web  pages.  One  way  to  define  whether  two  pages  are  related  is  if  they  share  a 
common  hyperlink  or  if  one  of  the  pages  includes  a  hyperlink  to  the  other  page. 

Several  random  name  generators  exist  on  the  Web  that  use  the  1990  U.S.  Census  data  to  ran- 
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domly  generate  a  name.  Examples  include  http  :  /  /www .  kleimo  .  com/ random/ name  . 
cfm,  which  allows  the  user  to  select  an  obscurity  value  between  1  and  99,  and  http  :  /  /www . 
unled.  net/,  which  generates  names  based  on  the  frequency  of  occurrence  of  the  first  and 
last  name  in  the  census  population  (See  Figure  2.1). 

2.5  Miscellaneous  Related  Work 

Skeels  and  Grudin  conducted  a  study  of  Microsoft  employees  in  early  2008  to  determine  the 
extent  to  which  the  employees  used  social  network  sites  and  how  they  used  those  sites  in  the 
workplace  [42].  They  found  that  Linkedln  was  used  mostly  by  younger  employees  seeking  to 
build  and  maintain  professional  connections,  while  Facebook  was  predominantly  used  for  social 
interactions  with  family,  friends,  and  co-workers.  With  Facebook  in  particular,  many  users  were 
more  wary  of  the  content  they  posted  online  after  learning  that  co-workers  and  supervisors 
were  also  seeing  their  posts.  Some  workers  were  hesitant  to  ignore  a  “Friend”  request  from 
a  supervisor  but  uncomfortable  with  allowing  their  boss  into  their  network  of  “Friends.”  One 
of  the  employees  interviewed  summarized  some  of  the  issues  with  the  question  “If  a  senior 
manager  invites  you,  what’s  the  protocol  for  turning  that  down?” 
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The  Random  Name  Generator 

The  random  name  generator  uses  data  from  the  US  Census  to 
randomly  generate  male  and  female  names.  Use  it  for  screenplays, 
fake  id's,  car  rentals,  pick-up  lines,  books,  prank  calls,  movies. 

Give  a  random  name  to  that  special  someone  you  meet  at  the  bar. 

Male  Female  °  Both  How  Many?  10 

Set  obscurity  factor  20 

l=Common,  50  =  IMot  so  common,  99=Totally  obscure 
Generate  Random  Name(s) 

1.  Rae  Peterkin 

2.  Kenya  Stecher 

3.  Barret 

4.  Mathew  Oesterling 

5.  Ted  Weisinger 

6.  Tyrone  Morello 

7.  Melisa  Cadorette 

8.  Cody  Sleeth 

9.  Darren  Mcferron 

10.  Allie  Dohrmann 

Attention  Authors!  Checkout  ... 

126,027,015  Random  names  served.  Last  batch  served  on  Fri  4/2/2010  @  09:42:58  AM 

If  you  like  this  site  you  might  also  like  my  latest  projects. 

The  Semantic  Dictionary 

My  Travel  Site 


e&V  amazon  com-  [yfShopping.com 


3  in  1  Search! 


CheeseNachos.com  Tasty  Triple  Product 
Search 


Figure  2.1:  Kleimo  Random  Name  Generator,  http://www.kleimo.com/random/ 
name  .  cfm  generates  random  names  using  1990  U.S.  Census  Data.  The  site  allows  the  user 
to  select  an  obscurity  value  from  1  to  99.  The  site  does  not  say  how  the  obscurity  of  a  name 
is  determined,  but  it  presumably  uses  the  frequency  data  included  with  the  census  data,  which 
provides  the  frequency  of  occurrence  of  the  first  and  last  names  in  the  census  population. 
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Random  American  Names 

Random  and  percentage  based  names  using  the  1990  U.S.  Census  Bureau  data. 

-Men - 

Not  based  on  percentage 
TORY  WONG 

Based  on  percentage 
CHAD  HEFNER 


-Women 

Not  based  on  percentage 
LOREAN  HUTCHISON 

Based  on  percentage 
STEPHANIE  BLAIR 


User  Submitted  Names 

Hey  everyone,  glad  you're  having  fun  using  the  above  name  generator.  I  have  two  major 
issues  with  it  however.  First,  the  U.S.  Census  Bureau  hasn't  released  another  study  like 
this  even  though  the  2000  census  has  long  since  past,  making  this  data  obsolete. 
Secondly,  because  this  is  only  for  names  in  the  U.S.,  we're  covering  just  5%  of  the  world 
population.  That's  where  you  come  in!  I  want  to  start  gathering  names  from  all  over  the 
world.  Once  I  have  enough,  I'll  add  an  option  here  to  display  random  names  from  around 
the  world. 

So  please  take  some  time  to  give  me  some  male,  female  and  last  names  from  your  home 
country. 

Last  Afghanistan  add 


Page  rendered  in  0.6156  seconds 


Figure  2.2:  Unled  Random  Name  Generator,  http  :  / / www .unled .  net /  is  another  Web- 
based  random  name  generator  that  uses  1990  U.S.  Census  data.  Presumably,  “based  on  percent¬ 
age”  means  that  the  frequency  information  for  each  first  and  last  name  included  in  the  census 
data  is  used  in  the  selection  of  a  first  and  last  name  pair.  However,  the  site  does  not  give  specific 
details  on  how  this  frequency  data  is  used. 
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CHAPTER  3: 

Approach  and  Contributions 


3.1  Approach 

We  have  listed  two  research  questions  that  we  will  attempt  to  answer  in  pursuit  of  the  objectives 
of  this  thesis,  which  are  to  find  out  how  prevalent  is  the  use  of  social  network  sites  by  DoD  per¬ 
sonnel  and  to  elevate  awareness  of  the  privacy  and  operational  implications  that  social  network 
sites  have  on  the  DoD.  Our  approach  to  answering  the  research  questions  will  be  to  perform 
experiments  designed  to  statistically  determine  the  percentage  of  DoD  personnel  participating 
in  three  popular  social  network  sites. 

Our  first  step  will  be  to  propose  a  method  for  finding  the  social  network  profiles  of  DoD  per¬ 
sonnel.  This  method  will  consist  of  choosing  an  uncommon  name  from  the  DoD41 1  directory, 
then  searching  for  that  name  on  a  social  network  site. 

We  will  then  propose  three  different  methods  for  randomly  choosing  uncommon  names  from 
a  directory.  We  need  to  choose  the  names  randomly  so  that  we  can  use  statistical  sampling  to 
infer  results  about  the  entire  population  of  the  directory  from  our  sample  set. 

Our  next  step  will  be  to  compare  the  different  methods  for  choosing  an  uncommon  name  from  a 
directory  to  test  their  effectiveness  at  finding  uncommon  names.  We  will  do  this  by  comparing 
the  names  chosen  using  the  three  methods  with  an  outside  independent  source. 

Then,  we  will  compile  a  sample  of  randomly  chosen  uncommon  names  from  the  DoD41 1  direc¬ 
tory  and  search  for  those  names  on  three  social  network  sites.  We  expect  that  since  the  names 
we  are  searching  for  are  uncommon,  we  will  be  able  to  easily  distinguish  the  social  network 
profiles  for  those  names.  We  will  then  count  the  number  of  matches  on  each  social  network  site 
for  each  of  the  uncommon  names  and  use  the  results  to  estimate  the  percentage  of  DoD  person¬ 
nel  with  accounts  on  those  social  network  sites.  We  will  also  be  able  to  estimate  the  percentage 
of  DoD  personnel  without  accounts  on  those  social  network  sites. 

We  will  not  use  member  accounts  on  the  social  network  sites  for  our  searching,  but  instead  will 
access  the  sites  as  a  regular  Internet  user  without  any  affiliation  with  the  sites.  This  way  we  can 
demonstrate  the  availability  of  profile  information  to  any  Internet  user.  We  also  believe  that  this 
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Name  Variation 

Example 

“First  Last” 

John  Smith 

“First  M  Last” 

John  R.  Smith 

“First  Middle  Last” 

John  Robert  Smith 

Table  3.1:  Name  variations  used  in  searches. 


will  better  approximate  automated  attacks  in  which  large  numbers  of  social  network  profiles  are 
retrieved. 


3.2  Contributions 

The  main  contribution  of  this  thesis  is  to  demonstrate  an  ability  to  identify  social  network  ac¬ 
counts  of  DoD  employees.  We  present  a  technique  for  finding  highly  identifiable  individuals 
that  can  be  used  to  automatically  assemble  a  person’s  Internet  footprint.  We  also  perform  ex¬ 
periments  designed  to  accurately  determine  the  percentage  of  DoD  employees  and  warfighters 
having  accounts  on  Facebook,  Linkedln,  and  MySpace  and  the  percentage  of  DoD  employees 
that  do  not  have  accounts  on  those  sites. 

3.2.1  Definitions 

Names  are  labels  that  are  assigned  to  individuals  and  groups  to  help  distinguish  and  identify 
them.  In  most  Western  cultures,  first  names,  or  given  names,  are  generally  used  to  identify 
individual  people  within  a  family  group  and  last  names,  or  surnames,  are  used  to  identify  and 
distinguish  family  groups.  Middle  names  are  also  often  given  to  help  distinguish  individuals 
within  a  family  group.  The  combination  of  a  first,  middle,  and  last  name  constitutes  an  individ¬ 
ual’s  full  or  personal  name.  Throughout  the  rest  of  this  thesis,  we  will  refer  to  this  combination 
of  first,  middle,  and  last  names  as  a  full  name.  Since  we  will  sometimes  need  to  distinguish  be¬ 
tween  different  combinations  of  a  full  name,  we  will  also  use  the  three  name  variations  shown 
in  Table  3.1. 

While  we  would  like  to  use  full  names  to  distinguish  between  individuals,  in  a  large  society  that 
is  not  often  possible.  Some  names  are  more  common  than  others  and  many  different  individuals 
might  all  have  the  same  name.  Other  names  are  less  common,  so  fewer  individuals  share  those 
names.  In  some  cases,  a  name  might  be  so  uncommon  that  it  distinguishes  an  individual  within 
an  entire  country,  or  even  the  entire  world  (See  Figure  3.1). 


24 


Figure  3.1:  Some  name  labels  are  more  common  and  are  shared  by  many  individuals.  Other 
name  labels  are  shared  by  only  one  or  a  few  individuals.  Thesis  advisor’s  name  used  with 
permission. 

We  define  an  uncommon  name  in  general  as  any  name  that  belongs  to  fewer  than  some  specified 
number  of  individuals,  N,  within  a  given  group.  For  practical  purposes,  we  define  an  uncommon 
name  as  any  name  that  appears  in  a  directory  fewer  times  than  some  threshold  T.  For  the 
remainder  of  this  thesis,  we  will  set  T  =  2  and  we  will  use  DoD41 1  as  the  directory  of  interest. 
Any  name  that  appears  in  the  DoD41 1  directory  0  or  1  times  will  be  considered  uncommon. 

We  make  a  distinction  between  the  term  “directory”  and  the  term  “social  network  site.”  We  will 
use  the  term  “directory”  to  refer  to  an  online  database  of  contact  information  for  a  specific  group 
of  people.  DoD411  is  an  example  of  such  a  directory  that  can  be  accessed  via  a  Web  interface 
or  using  LDAP.  We  will  use  the  term  “social  network  site”  to  refer  to  sites  in  which  users  can 
create  their  own  profile  and  make  connections  with  other  users.  Facebook  is  an  example  of  a 
social  network  site. 

3.2.2  Why  Uncommon  Names? 

One  of  the  purposes  of  this  thesis  is  to  demonstrate  an  ability  to  identify  social  network  profiles 
belonging  to  DoD  employees  and  to  get  an  accurate  assessment  of  the  number  of  DoD  employ- 
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ees  using  popular  social  networking  sites.  A  central  feature  of  most  social  networking  sites  is 
the  ability  to  search  for  other  members.  The  primary  method  of  searching  for  other  members  is 
searching  by  a  personal  name.  However,  a  large  proportion  of  personal  names  are  too  common 
to  be  used  for  uniquely  identifying  an  individual.  For  example,  a  search  for  the  name  “Kenneth 
Phillips”  on  Whitepages.com  results  in  1,331  matches  within  the  United  States4. 

3.2.3  Methods  for  Choosing  Uncommon  Names  from  a  Directory 

Because  our  experiments  will  involve  searching  for  social  networking  profiles  of  individuals 
whose  names  we  retrieve  from  a  directory,  we  need  a  way  to  choose  individuals  whose  names 
are  likely  to  uniquely  identify  them.  By  using  only  names  that  are  uncommon,  we  increase  the 
likelihood  that  any  results  found  for  a  name  are  associated  with  and  belong  to  the  individual 
for  whom  we  are  searching.  In  this  section  we  propose  three  different  methods  for  randomly 
choosing  uncommon  names  that  appear  in  a  directory.  We  want  to  choose  names  randomly  so 
that  statistics  calculated  from  the  random  sample  will  be  representative  of  the  population  as  a 
whole.  There  are  three  primary  reasons  for  which  a  name  may  be  uncommon: 

1 .  Names  in  which  the  given  name(s)  and  the  surname  come  from  different  cultural  or  ethnic 
origins,  resulting  in  an  uncommon  combination  that  forms  an  uncommon  full  name. 

2.  Given  names  that  are  uncommon  or  novel  on  their  own,  resulting  in  an  uncommon  full 
name. 

3.  Surnames  that  are  uncommon  due  to  small  family  size,  combining  surnames  in  marriage, 
or  other  reasons. 

Our  three  proposed  methods  each  take  advantage  of  one  or  more  of  these  reasons.  See  Table 
3.2  for  a  comparison  of  the  methods. 

3.2.4  Method  1:  Randomized  Combination 

This  method  takes  a  list  of  first  names  and  last  names,  randomly  combines  them  to  create  a  full 
name,  and  queries  the  full  name  against  a  large  directory.  If  the  result  of  the  query  is  a  single 
name,  the  name  is  deemed  to  be  uncommon.  A  prerequisite  for  this  method  is  that  we  have 
a  large  list  of  first  and  last  names  and  a  directory  that  can  be  queried  by  name.  For  any  large 
list  of  names,  any  name  that  appears  on  the  list  may  or  may  not  be  uncommon  on  its  own.  So 

4http : / / names . whitepages . com/ kenneth /phi Hips 
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Method 

Preconditions 

Advantages 

Disadvantages 

Randomized 

List  of  first  names 

Simple 

Many  queries  required 

Combination 

List  of  last  names 
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for  each  result 

Directory  that  can 
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represent  a  real  person 

Filtered 
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Simple 
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Exhaustive 

Name  property  must  be 

Complete 
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Search 

capable  of  querying 
Directory  allows  exhaustive 
set  of  queries 

Consumes  resources 

Figure  3.2:  Comparison  of  the  different  techniques  for  randomly  choosing  uncommon  names 
from  a  directory. 


any  given  first  name  and  last  name  might  not  on  their  own  be  uncommon,  but  when  combined, 
if  they  are  from  different  ethnic  origins  the  chances  are  greatly  increased  of  their  combination 
resulting  in  an  uncommon  full  name.  The  main  disadvantage  of  this  method  is  that  it  requires 
many  queries  to  the  directory  for  each  uncommon  name  found. 

3.2.5  Method  2:  Filtered  Selection 

This  method  randomly  selects  a  full  name  from  a  directory  and  checks  the  first  and  last  name  for 
membership  in  a  list  of  common  first  and  last  names.  The  specific  method  of  selecting  a  name 
randomly  from  a  directory  would  depend  on  the  specific  directory,  but  could  include  queries 
for  a  unique  identification  number  (as  in  the  DoD411  directory’s  “employ eeNumber”  field)  or 
queries  for  a  first  or  last  name  using  wildcard  characters  mixed  with  different  combinations  of 
letters.  If  either  the  first  or  last  name  does  not  appear  on  the  name  lists,  the  name  is  considered 
to  be  uncommon.  As  with  Method  1,  a  prerequisite  for  this  method  is  a  large  list  of  common 
first  and  last  names.  One  advantage  to  this  method  is  that  “bulk”  queries  can  be  made  to  the 
directory  to  get  a  list  of  names  up  to  the  size  limit  allowed  by  the  directory,  thereby  reducing  the 
total  number  of  queries  made  to  the  directory.  The  small  number  of  queries  makes  this  method 
faster  than  the  other  two  methods.  The  disadvantage  to  this  method  is  that  it  does  not  query  the 
directory  to  make  sure  the  name  only  appears  once,  so  names  generated  using  this  method  are 
only  uncommon  with  respect  to  the  list  of  common  first  and  last  names.  If  the  list  is  not  very 
comprehensive,  then  the  names  selected  using  this  method  might  not  be  as  uncommon  as  those 
selected  using  other  methods. 
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3.2.6  Method  3:  Exhaustive  Search 

This  method  is  also  based  on  the  second  and  third  reasons  for  an  uncommon  name.  We  begin  by 
choosing  some  property  of  a  full  name  for  which  we  can  query  a  directory.  We  then  repeatedly 
query  the  directory  for  names  with  that  property  until  we  have  retrieved  a  complete  list.  As 
an  example,  we  could  choose  the  property  that  the  surname  begins  with  “A”.  We  would  then 
retrieve  all  names  on  the  directory  with  a  surname  beginning  with  “A”.  Next,  we  generate  a 
histogram  of  first  names  and  last  names  in  our  list  of  names  and  any  names  that  appear  in  the 
list  fewer  times  than  some  threshold  T  are  marked  as  uncommon.  In  this  manner  we  can  find 
all  of  the  uncommon  names  in  a  directory  with  any  given  property,  so  long  as  the  property  we 
wish  to  search  for  is  something  for  which  we  can  construct  a  query  to  the  directory.  We  could, 
for  example,  find  all  uncommon  first  names  with  the  property  that  the  surname  is  “Smith”.  Note 
that  we  could  also  exhaustively  retrieve  the  entire  list  of  names  in  the  directory  and  thus  have  a 
way  to  find  every  uncommon  name  in  the  directory.  Downloading  the  entire  directory  requires 
more  time  and  effort  to  be  effective,  but  does  not  require  an  auxiliary  name  list. 
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CHAPTER  4: 
Experiments 


4.1  Comparing  Methods  for  Finding  Uncommon  Names 

In  this  section,  we  describe  the  experiment  performed  to  compare  the  “uncommonness”  of 
names  chosen  using  the  three  methods  proposed  in  Section  3.2.3  to  determine  which  method  is 
more  effective  for  choosing  uncommon  names. 

We  begin  the  experiment  by  using  the  methods  proposed  in  Section  3.2.3  to  compile  three 
separate  lists  of  uncommon  names.  Each  of  the  three  methods  requires  a  directory,  so  we  choose 
DoD411.  For  the  name  list,  we  used  the  name  lists  from  the  U.S.  Census  Bureau5,  which  were 
composed  based  on  a  sample  of  7.2  million  census  records  from  the  1990  U.S.  Census  [43]. 
The  surname  list  from  1990  contains  88,799  different  surnames.  The  first  name  lists  contain 
1,219  male  first  names  and  4,275  female  first  names. 

4.1.1  Using  Randomized  Combination  (Method  1) 

In  order  to  use  the  Randomized  Combination  method  to  compile  a  list  of  random  names,  we 
require  a  list  of  first  and  last  names.  The  more  extensive  the  list,  the  better. 

We  found  that  most  of  the  names  generated  using  the  Census  Bureau  lists  were  so  uncommon 
that  they  did  not  appear  on  DoD411  at  all.  In  one  test,  we  generated  828  names,  but  only 
20  of  them  appeared  on  DoD411,  a  2.4%  hit  rate.  In  practice,  we  modified  this  method  to 
generate  names  using  only  a  random  first  initial  combined  with  a  randomly  drawn  last  name, 
which  worked  because  DoD411  allows  queries  involving  wildcards.  Using  this  method,  it  took 
55  minutes  to  retrieve  1,000  uncommon  names  from  DoD411.  We  generated  1,610  names,  of 
which  1,223  appeared  on  DoD41 1,  for  a  hit  rate  of  76%.  Of  the  1,223  names  that  appeared  on 
DoD411,  1,000  of  them  (81.7%)  appeared  only  once  on  DoD411  (excluding  middle  names  and 
generational  identifiers).  See  Appendix  6.1,  6.2,  and  6.3  for  our  implementation  of  this  method. 

4.1.2  Using  Filtered  Selection  (Method  2) 

As  with  the  previous  method,  this  method  also  requires  a  list  of  first  and  last  names.  As  with 
the  previous  method,  we  used  the  1990  Census  name  lists.  These  lists  are  ideal  for  this  method 

5See  http : / / www . census . gov/ genealogy/www/ data/1 990surnames/ index . html  and 
http : //www .census . gov/ genealogy/www/ data/ 2000sur names/ index . html 
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because  of  the  way  in  which  they  were  composed.  First,  the  lists  are  based  on  a  sample  of 
7.2  million  census  records,  so  any  names  uncommon  enough  that  they  don’t  appear  in  the  7.2 
million  records  are  not  on  the  lists.  Second,  names  that  were  part  of  the  7.2  million  records 
but  that  occurred  with  low  frequency  were  also  not  included  in  the  lists.  According  to  the 
documentation  provided  with  the  lists,  a  name  that  does  not  appear  on  the  lists  can  be  considered 
“reasonably  rare”  [43].  The  documentation  also  states  that  for  purposes  of  confidentiality,  the 
names  available  in  each  of  these  lists  are  restricted  to  the  minimum  number  of  entries  that 
contain  90  percent  of  the  population  for  that  list,  which  means  that  names  occurring  with  the 
lowest  frequency  are  excluded  from  the  lists,  which  is  desirable  for  our  purposes. 

Our  implementation  of  this  method  appears  in  Appendix  6.2  and  6.4.  Using  DoD411  as  the 
directory,  we  were  able  to  retrieve  1,761  uncommon  names  in  53  minutes  on  January  27,  2010. 
We  achieved  this  by  querying  for  a  lists  of  100  names  at  a  time  beginning  with  names  containing 
the  letter  ’a’  in  the  first  name  and  ’a’  in  the  last  name,  then  ’a’  and  ’b’,  and  so  on  up  to  ’z’  and 
’z’. 

4.1.3  Using  Exhaustive  Search  (Method  3) 

Using  the  process  described  in  Method  3,  we  retrieved  all  names  on  DoD41 1  with  the  property 
that  the  surname  begins  with  the  letter  “G”.  Using  a  threshold  T  =  1,  we  generated  a  histogram 
of  these  names  which  resulted  in  9,942  uncommon  first  names  and  9,285  uncommon  surnames. 
Since  we  used  a  threshold  of  T  =  1,  all  of  the  uncommon  surnames  are  unique  on  DoD411. 
This  is  not  necessarily  the  case  with  the  uncommon  first  names  retrieved  using  this  method 
because  a  first  name  that  is  unique  in  a  list  of  “G”  surnames  might  appear  in  a  list  of  full  names 
in  which  the  surname  begins  with  some  other  letter. 

4.1.4  Using  an  Outside  Source  for  Comparison  of  the  Three  Methods 

Whitepages.com  allows  provides  the  ability  to  search  for  contact  information  using  a  first  and 
last  name,  much  like  the  white  pages  of  a  traditional  phone  book,  except  that  it  returns  matches 
from  the  entire  U.S.  Whitepages.com  provides  any  other  known  information  for  each  matching 
person,  including  phone  number,  address,  age,  employer,  the  names  of  household  members, 
links  to  Facebook  and  Twitter  pages,  a  link  to  a  listing  of  neighbors,  and  a  map  showing  the 
location  of  their  house.  In  addition  to  providing  contact  information,  Whitepages.com  also 
provides  “name  facts,”  which  include  a  name’s  origin,  variants,  nicknames,  distribution  across 
the  U.S.  by  state,  a  histogram  showing  the  number  of  recent  searches  for  the  name,  ranking  of 
the  first  and  last  name  in  the  U.S.,  and  the  number  of  people  in  the  U.S.  with  that  name. 
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We  used  Whitepages.com  to  perform  an  experiment  designed  to  compare  the  effectiveness  of 
each  of  our  three  methods  for  finding  uncommon  names.  The  experiment  consisted  of  looking 
up  1,000  names  found  by  each  of  the  three  methods  on  Whitepages.com  and  retrieving  the 
reported  number  of  people  in  the  U.S.  with  that  name.  We  assumed  that  uncommon  names 
would  result  in  a  very  low  number  of  matches  and  common  names  would  result  in  a  high 
number  of  matches.  We  expected  that  the  most  effective  of  the  three  methods  would  show  a 
high  number  of  0  or  1  matches.  If  any  one  of  the  sets  of  names  resulted  in  a  lot  of  matches  for 
a  significant  portion  of  the  names,  then  the  method  used  to  generate  that  set  would  be  deemed 
ineffective. 

We  performed  this  experiment  on  April  27,  2010  using  the  code  in  Appendix  6.5.  For  compar¬ 
ison,  we  randomly  retrieved  1,000  names  from  the  DoD411  server  without  regard  to  whether 
they  were  uncommon.  The  histograms  for  each  of  the  three  methods  and  the  randomly  selected 
set  are  show  in  Figure  4.1.  Based  on  the  histograms,  Method  3  is  the  most  effective  method  for 
selecting  uncommon  names.  All  of  the  names  in  the  Method  3  list  had  fewer  than  8  matches 
and  more  than  75%  of  them  had  either  zero  or  one  match.  Just  under  50%  of  the  names  in  the 
Method  1  list  had  zero  or  one  match  and  about  60%  of  the  Method  2  names  had  zero  or  one 
match.  In  comparison,  only  about  15%  of  the  names  in  the  randomly  selected  set  had  zero  or 
one  match  and  the  rest  had  between  three  and  21,394  matches. 

A  statistical  summary  of  each  list  of  1,000  names  is  shown  in  Table  4.1.  This  table  clearly  shows 
that  Method  3  has  the  highest  number  of  0  or  1  matches,  meaning  that  the  list  generated  using 
Method  3  selected  the  best  set  of  uncommon  names.  In  comparison  to  the  randomly  selected 
set,  all  three  methods  were  effective  at  selecting  uncommon  names.  Since  Method  3  takes  more 
time  and  resources  to  select  uncommon  names,  we  will  use  names  generated  using  Method  1 
for  the  remaining  experiments. 

4.2  Determining  Percent  of  DoD  Using  Linkedln 

The  purpose  of  this  experiment  was  to  determine  the  percentage  of  DoD  personnel  that  have 
Linkedln  pages  without  surveying  the  DoD  personnel.  To  make  this  determination,  we  used 
randomly  chosen  uncommon  names  drawn  from  DoD411  as  probes  to  search  publicly  avail¬ 
able  Linkedln  profiles.  We  assume  that  individuals  with  uncommon  names  are  likely  to  have 
Linkedln  pages  with  the  same  frequency  as  individuals  with  common  names,  but  because  these 
names  are  uncommon  it  is  easier  for  us  to  identify  them  with  high  confidence. 
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Figure  4.1:  Histograms  comparing  the  three  uncommon  name  selection  methods.  1,000  names 
were  selected  using  each  of  the  three  methods,  then  we  queried  Whitepages.com  to  determine 
the  number  of  people  in  the  U.S.  with  each  name.  The  histograms  show  counts  for  the  number 
of  people  who  share  the  same  name.  The  fourth  histogram  is  composed  of  1,000  names  selected 
at  random,  without  bias  to  whether  they  are  uncommon.  We  are  looking  for  methods  that  show 
a  peak  at  0  or  1  match.  The  first  bin  in  each  histogram  represents  the  count  for  0  and  1  match. 
All  three  selection  methods  do  better  than  random  selection.  The  best  method  is  Exhaustive 
Search.  75%  of  the  1,000  names  selected  using  this  method  had  0  or  1  match,  compared  with 
random  selection,  in  which  only  about  15%  had  0  or  1  match.  48%  of  the  names  selected  using 
Method  1  had  0  or  1  match  and  58%  of  those  selected  using  Method  2  had  0  or  1  match. 
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Number  of  matches  on  Whitepages.com  per  name 

Min 

Max 

Mean  0  or  1  Matches 

Method  1,  Randomized  Combination 

0 

231 

4.86 

470 

Method  2,  Filtered  Selection 

0 

1360 

8.25 

583 

Method  3,  Exhaustive  Search 

0 

8 

1.41 

766 

Randomly  Selected  Names,  No  Bias 

0 

21394 

481.91 

161 

Table  4.1:  Summary  statistics  for  three  methods  of  selecting  uncommon  names.  The  most 
effective  method  for  generating  uncommon  names  is  the  method  with  the  lowest  number  of 
0  or  1  matches,  which  means  that  more  of  the  1,000  names  selected  using  that  method  were 
reported  by  Whitepages.com  as  representing  0  or  1  people  in  the  entire  U.S.  We  know  that 
each  name  in  the  lists  represents  at  least  1  person  in  the  U.S.  because  we  got  each  name  from 
DoD411,  but  names  reported  as  having  0  matches  by  Whitepages.com  are  so  uncommon  that 
Whitepages.com  doesn’t  know  about  them. 


4.2.1  Experimental  Setup 

In  preparing  for  this  experiment,  we  needed  to  determine  the  best  method  to  conduct  an  auto¬ 
mated  search  for  Linkedln  member  profiles.  The  two  options  that  we  compared  and  considered 
were  the  Linkedln  public  search  page  and  Google.  We  chose  not  to  perform  an  automated 
search  using  the  Linkedln  search  page  as  an  authenticated  Linkedln  member. 

We  first  tested  the  Linkedln  public  search  tool  on  the  Linkedln  homepage,  which  allows  unau¬ 
thenticated  visitors  to  search  the  public  profiles  of  Linkedln  members  by  entering  a  first  and 
last  name  or  by  browsing  through  an  alphabetical  directory  listing,  (Figure  4.2).  We  found 
that  this  public  search  page  returns  limited  and  incomplete  results.  For  example,  we  searched 
for  the  common  name  “John  Smith.”  Using  the  Linkedln  public  search  page  resulted  in  only 
30  matches,  but  the  same  search  performed  while  signed  in  as  a  Linkedln  member  resulted  in 
5,336  matches  (Linkedln  members  with  a  free  personal  account  can  view  the  only  the  first  100 
of  these  matches).  Based  on  these  tests,  we  conclude  that  Linkedln’s  public  search  tool  returns 
incomplete  results. 

A  second  limitation  of  the  Linkedln  public  search  page  is  that  it  only  allows  searching  by  first 
and  last  name.  There  is  no  provision  for  including  a  middle  name,  professional  title,  or  any 
other  search  terms  or  options.  An  attempt  to  search  for  “John  R  Smith”  by  placing  “John  R”  in 
the  first  name  search  box  or  placing  “R  Smith”  in  the  last  name  search  box  resulted  the  same 
list  of  30  names  as  a  search  for  “John  Smith.”  In  contrast,  a  search  for  “John  R  Smith”  using 
the  member-only  search  page,  which  does  allow  searching  for  a  middle  name,  resulted  in  a  list 
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Search  for  someone  by  name:  First  Name 


Last  Name 


Go 


Linkedln  member  directory:  abcdefghijklmnopqrstuvwxyz  more  |  Browse  members  by  country 


Figure  4.2:  Linkedln  public  search  page. 


Option 

Value 

Purpose 

V 

1  .  0 

Mandatory  option. 

rsz 

large 

Returns  8  results  at  a  time  in¬ 
stead  of  4. 

hi 

en 

Returns  only  English  lan¬ 
guage  pages. 

filter 

0 

Prevents  filtering  out  of  simi¬ 
lar  results. 

start 

0 

Results  are  returned  starting 
at  item  0.  Increment  by  8  for 
subsequent  results. 

q 

" john+r+ smith" +- /updates!-/ dir+ 
-/di rectory!-/ grouplnvitat ion+ 
site : www . linkedin . com 

Query  portion  of  URL. 

Table  4.2:  Google  AJAX  search  options  for  retrieving  Linkedln  profiles 

of  only  1 1  matches,  ten  of  which  were  for  profile  names  that  exactly  matched  “John  R  Smith.” 
The  1 1th  result  had  a  nickname  inserted  in  between  “R”  and  “Smith,”  but  was  still  for  someone 
named  “John  R  Smith.”  Due  to  these  limitations,  we  ruled  out  using  the  Linkedln  public  search 
tool  and  decided  to  use  Google,  which  indexes  Linkedln  profile  pages. 

We  fine-tuned  our  query  to  Google  based  on  experimentation  and  manual  inspection  of  searches 
for  several  different  names.  We  found  that  by  using  the  search  options6  show  in  Table  4.2  and 
by  constructing  the  query  string  in  such  a  way  as  to  exclude  results  found  in  the  “updates,” 
“dir,”  “directory,”  and  “grouplnvitation”  subdirectories  on  Linkedln7,  we  were  able  to  obtain 

6See  http  :  /  /  code  .  google  .  com/ apis/a  jaxsearch/documentation  for  full  list  of  options. 

7Results  that  originated  within  these  excluded  Linkedln  directories  were  not  profile  pages,  but  rather  directory 
listings  or  invitations  for  group  pages. 
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the  desired  results.  Our  resulting  URL  for  a  query  using  the  Google  AJAX  API  was  as  so: 

http://ajax.go ogle apis. com/a j ax/ services/ sear ch/ web ?v=l . 
0&rsz=large&hl=en&f ilter=0&q=" john+r+ smith "+-/updates+-/dir+ 

- /direct ory+- /grouplnvit at ion+s it e : www .linkedin. com&start=0 

To  validate  our  decision  to  use  the  Google  search  engine,  we  manually  compared  search  results 
obtained  using  Google  with  those  obtained  using  Linkedln’s  member-only  search  page  and 
found  the  results  to  be  nearly  identical.  Going  back  to  our  example  name  of  “John  R  Smith,” 
we  found  that  Google  returned  10  of  the  11  profile  pages  listed  by  Linkedln’s  search  engine, 
omitting  only  the  result  with  a  nickname  inserted  between  “R”  and  “Smith.”  A  similar  compar¬ 
ison  on  a  search  for  “Nate  Phillips”  resulted  in  identical  search  results  from  both  Google  and 
Linkedin. 

We  wrote  a  Python  script  (see  Appendix  6.2,  6.6,  and  6.7)  to  automate  a  search  using  the 
following  steps: 

1.  Retrieve  a  name  from  DoD411  by  constructing  an  LDAP  query  consisting  of  a  surname 
randomly  drawn  from  the  U.S.  Census  Bureau  1990  surname  list  and  the  first  letter  of  a 
name  randomly  drawn  from  the  U.S.  Census  Bureau  first  name  list. 

2.  For  each  name  retrieved  in  step  1,  check  whether  any  other  names  appear  on  DoD411 
with  the  same  first  name  and  surname. 

3.  If  the  name  appears  only  once  on  DoD411,  mark  it  as  uncommon  and  search  Linkedin 
for  a  profile  matching  that  name. 

4.  For  each  uncommon  name  retrieved  from  DoD41 1,  perform  three  separate  searches  using 
each  of  the  three  name  variations  shown  in  Table  3.1. 

We  began  the  experiment  on  15  November  2009  and  finished  on  16  November  2009,  collecting 
data  for  3,619  uncommon  names.  The  total  running  time  was  less  than  24  hours. 

4.2.2  Validation 

We  manually  verified  a  random  subset  of  our  results  to  validate  our  search  technique.  Our 
validation  method  was  to  choose  36  names  that  resulted  in  0  matches  and  36  names  that  resulted 
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Industry  Military 
Military  industry 
Military 

Government  Agency 

US  Army 

Commander 

USAF 

Defense 

Defence 

Department  of  Defense 

3d 

2d 

United  States  Air  Force 
United  States  Naval  Academy 
DOD 


Table  4.3:  Keywords  indicating  DoD  affiliation  of  Linkedln  profile  owner  (not  inclusive) 


in  1  match  and  manually  search  for  them  using  the  member-only  Linkedln  search  page.  Of  the 
names  with  0  matches,  our  automated  results  were  correct  in  returning  0  matches  for  35  of  36. 
The  remaining  name  should  have  been  marked  as  a  match  but  was  incorrectly  labeled  by  our 
automated  tool  as  not  matching  due  to  a  non-standard  name  format  returned  by  DoD41 1.  Of  the 
names  with  1  match,  all  36  had  a  single  Facebook  match.  We  manually  checked  each  profile  to 
determine  whether  we  could  be  determine  if  they  were  affiliated  with  DoD.  10  of  the  36  profiles 
contained  words  that  caused  us  to  conclude  that  the  profile  owner  was  most  likely  affiliated 
with  the  DoD  (see  Table  4.3).  The  remaining  26  profiles  were  ambiguous  with  respect  to  DoD 
affiliation. 


4.2.3  Results 

We  retrieved  3,619  uncommon  names  from  DoD41 1  and  searched  for  Linkedln  profiles  match¬ 
ing  each  of  those  names  using  Google.  81.8%  of  the  names  had  zero  matching  profiles,  11.4% 
had  exactly  one  matching  profile,  and  the  remaining  6.7%  had  more  than  one  matching  profile. 
See  Table  4.4.  All  of  the  matching  profiles  with  the  exception  of  one  were  found  using  a  search 
for  the  “First  Last”  name  variation  (See  Table  3.1).  Only  one  match  was  found  with  a  search 
using  the  “First  M.  Last”  variation.  Based  on  these  results,  we  believe  that  between  11%  and 
18%  of  DoD  personnel  have  profiles  on  Linkedln.  We  also  believe  that  at  least  81%  of  DoD 
personnel  do  not  have  profiles  on  Linkedln. 
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Number  of  Matches 

Number  of  Names 

Percent 

0 

2962 

81.85% 

1 

411 

11.36% 

2 

116 

3.21% 

3 

64 

1.77% 

4 

32 

0.88% 

5 

8 

0.22% 

6 

9 

0.25% 

7 

3 

0.08% 

8  or  more 

14 

0.39% 

Table  4.4:  Distribution  of  Linkedln  profile  matches  for  uncommon  names. 

93.2%  of  the  3,619  names  that  we  searched  for  had  only  zero  or  one  matching  profile.  Based 
on  this  percentage,  we  believe  that  the  list  of  names  for  which  we  searched  was  comprised 
of  mostly  uncommon  names.  Further  we  believe  that  the  Randomized  Combination  method 
(Section  3.2.4)  used  in  this  experiment  for  finding  uncommon  names  on  DoD41 1  is  a  valid  and 
useful  method. 

4.2.4  Limitations  and  Problems  Encountered 

We  note  two  limitations  that  we  discovered  with  our  method  of  searching  for  Linkedln  profiles. 

1.  Our  code  did  not  process  names  returned  by  DoD411  having  more  than  three  words  in 
the  name,  as  in  “John  Jacob  Smith  Jones”  or  “John  R.  Smith  Jr.”  We  chose  to  ignore  this 
limitation  as  it  did  not  affect  the  results  of  the  experiment  (assuming  that  people  with  four 
names  use  Linkedln  in  the  same  proportion  as  those  with  two  or  three  names). 

2.  We  only  counted  a  result  returned  by  Google  as  a  match  if  the  name  on  the  Linkedln 
profile  exactly  matched  the  first  and  last  name  for  which  we  were  searching.  This  means 
that  Linkedln  profiles  using  a  shortened  version  of  the  name  (e.g.,  Dan  for  Daniel)  or  a 
nickname  were  not  counted  as  a  match  by  our  search  method. 

It  appears  that  Google  indexes  Linkedln  profiles  based  only  on  first  name  and  last  name,  even  if 
a  profile  is  labeled  with  first  name,  middle  initial,  and  last  name  (e.g.,  a  search  for  “John  Doe” 
returns  “John  A  Doe,”  but  a  search  for  “John  A  Doe”  does  not  return  “John  Doe”  or  “John  A 
Doe”).  In  this  case,  our  automated  search  code  would  not  tally  the  result  as  a  match.  Other 
instances  in  which  a  valid  match  would  not  be  counted  by  our  search  tool  include  profile  titles 
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that  contain  a  salutation  or  professional  title  (e.g.,  Dr.  or  Ms.),  a  spouse’s  name  (e.g.,  “John  and 
Mary  Smith”),  or  reverse  name  ordering  (e.g.,  “Smith,  John”). 

To  address  the  second  limitation,  we  manually  reviewed  our  results  for  any  names  for  which 
Google  returned  at  least  one  result  but  for  which  our  tool  ignored  the  result.  We  found  that  for 
the  3,619  names,  only  one  profile  contained  a  shortened  version  of  the  first  name,  12  profiles 
contained  a  middle  initial,  two  contained  a  spouse’s  name,  one  contained  a  professional  title, 
and  1  had  reverse  name  ordering.  In  total,  17  profiles  out  of  3,619  names  (0.46%  of  the  sample) 
were  not  considered  as  valid  matches  by  our  tool. 

One  problem  that  we  found  and  corrected  involved  Unicode.  The  names  that  we  retrieved  from 
DoD411  were  in  ASCII  format.  Any  names  containing  non- ASCII  characters  were  returned  as 
ASCII  characters  (e.g.,“n”  was  returned  as  “n”).  The  results  from  Google  were  in  Unicode  and 
some  names  were  listed  using  Unicode  characters  that  did  not  display  properly  when  converted 
to  ASCII.  To  fix  this  problem,  we  normalized  all  results  from  Google  into  ASCII  characters  so 
we  could  properly  compare  them  with  the  names  from  DoD41 1.  This  was  accomplished  using 
the  following  line  in  Python: 

title  =  unicodedata . normalize (' NFKD' ,  title) . encode (' ascii ' ignore' ) 


4.2.5  Lessons  Learned  and  Proposed  Improvements 

Based  on  the  results  discussed  above,  we  learned  that  for  the  search  method  we  used,  it  is 
unnecessary  to  search  for  the  three  different  variations  of  each  name  as  listed  in  Table  3.1,  but 
that  searching  only  for  the  “First  Last”  variation  was  sufficient.  Of  the  3,619  names,  only  one 
yielded  a  match  on  a  search  for  the  “First  M.  Last”  variation  and  none  yielded  a  match  when 
searching  for  “First  Middle  Last”  variation. 

Our  search  method  could  benefit  from  several  possible  improvements.  Rather  than  using  a 
boolean  decision  for  classifying  each  profile  returned  by  the  search  as  a  positive  or  negative 
match,  we  could  use  probabilistic  methods  to  assign  each  possible  match  a  likelihood  of  belong¬ 
ing  to  the  DoD  member  for  whose  name  we  are  searching.  At  least  two  items  could  contribute 
to  determining  this  likelihood.  First,  Linkedln  profiles  generally  show  a  location  for  the  profile 
owner,  so  the  location  could  be  compared  with  locations  generally  associated  with  DoD  mem¬ 
bers.  Common  DoD  locations  would  give  that  profile  a  higher  likelihood  of  belonging  to  the 
DoD  member  and  being  a  match.  Second,  we  could  search  each  profile  page  for  words  related 
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to  DoD  topics,  such  as  those  words  shown  in  Table  4.3.  Pages  containing  such  keywords  would 
be  given  an  increased  likelihood  of  representing  a  match. 


4.3  Determining  Percent  of  DoD  Using  Facebook 

The  purpose  of  this  experiment  was  to  determine  the  percentage  of  DoD  personnel  that  have 
Facebook  pages.  To  make  this  determination,  we  again  used  randomly  chosen  uncommon 
names  drawn  from  DoD411  as  probes  to  search  publicly  available  Facebook  profiles.  As  with 
the  Linkedln  experiment,  our  hypothesis  is  that  individuals  with  uncommon  names  are  likely  to 
have  Facebook  pages  with  the  same  frequency  as  individuals  with  common  names,  but  because 
these  names  are  uncommon  it  is  easier  for  us  to  identify  them  with  high  confidence. 

4.3.1  Experimental  Setup 

Facebook  provides  a  public  search  tool  at  http :  //www .  facebook  .  com/srch  .php  that 
allows  unauthenticated  Web  users  to  search  for  Facebook  profiles  using  a  name  (See  Figure  4.3). 
It  is  important  that  this  search  tool  allows  Web  users  without  Facebook  accounts  to  search  for 
Facebook  profiles  because  we  wanted  to  use  only  publicly  available  methods  for  our  experi¬ 
ment.  Unlike  the  search  tool  provided  by  Linkedln,  both  the  public  and  private  versions  of  the 
Facebook  search  tool  return  similar  results. 

We  compared  the  results  of  searches  for  several  different  names  using  both  the  public  search 
tool  and  the  private  member-only  search  tool  to  make  sure  that  the  public  search  tool  provided 
complete  and  accurate  results.  We  also  tested  that  the  search  tool  was  able  to  accept  and  distin¬ 
guish  all  three  name  variations  for  which  we  wished  to  search  (see  Table  3.1).  As  an  example 
of  our  tests,  we  searched  for  “John  R  Smith”  using  both  the  public  and  member-only  versions 
of  the  search  tool.  The  public  tool  returned  167  matches  while  the  private  tool  returned  168 
matches.  We  attribute  this  discrepancy  to  a  user- selectable  privacy  option  that  allows  limiting 
searches  for  one’s  profile  to  friends  or  friends-of-friends  only,  rather  than  the  default  of  every¬ 
one8.  The  returned  matches  were  for  profiles  with  the  name  “John  R  Smith”  or  some  variation 
thereof,  such  as  “R  John  Smith”  or  “John  R  Smith  III.”  We  noted  that  the  most  relevant  results 
appeared  first  in  the  search  listing,  and  variations  on  the  name  under  search  only  appeared  after 
all  of  the  exact  matches.  We  observed  similar  results  using  searched  for  other  names  and  were 
satisfied  that  the  public  version  of  the  search  tool  was  acceptable  for  our  purposes. 

8See  http : //www . facebook . com/ privacy/ explanation . php 
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facebook 


Sign  Up 


Facebook  helps  you  connect  and  share  with  the  people  in  your  fife. 


Q*  Search  for  Friends  on  Facebook 

Search  results  will  only  give  you  a  preview  of  what's  available  on  Facebook. 
Sign  up  for  Facebook  to  connect  with  friends  and  see  profiles. 

Search  By  Name 

Person's  Name: 


Search  by  Name 


Figure  4.3:  Facebook  public  search  page. 

We  discovered  that  the  public  search  tool  limits  the  viewable  results  to  the  first  three  pages, 
i.e.,  the  first  30  matches.  Only  authenticated  members  can  view  more  than  the  first  30  profiles 
returned.  This  limitation  does  not  affect  our  method,  however,  because  we  are  only  interested  in 
Facebook  profiles  for  users  with  unusual  names,  which  by  definition  should  result  in  far  fewer 
matches  than  the  viewable  limit  of  30.  The  private  search  tool  additionally  allows  searching  by 
email  address,  school,  or  company. 

In  order  to  automate  our  search,  we  wrote  a  Python  script  (see  Appendix  6.8)  to  send  queries 
to  Facebook  and  extract  matching  profiles  from  the  Web  page  returned  by  Facebook.  We  were 
able  to  use  the  public  Facebook  search  tool  by  using  a  query  of  this  form: 


http : / / www . facebook . com/ srch . php?nm=john+r+ smith 
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We  also  added  a  referrer  URL  and  a  cookie  to  our  query  for  reasons  discussed  further  in  the 
Problems  Encountered  section. 

Our  Python  script  performed  the  automated  search  using  the  using  the  following  steps: 

1.  Retrieve  a  name  from  DoD411  by  constructing  an  LDAP  query  consisting  of  a  surname 
randomly  drawn  from  the  U.S.  Census  Bureau  1990  surname  list  and  the  first  letter  of  a 
name  randomly  drawn  from  the  U.S.  Census  Bureau  first  name  list. 

2.  For  each  name  retrieved  from  DoD41 1,  perform  three  separate  queries  to  Facebook  using 
each  of  the  three  name  variations  shown  in  Table  3.1. 

3.  Count  the  number  of  exact  matches  returned  by  Facebook  by  parsing  the  HTML  of  the 
Web  page  returned. 

We  ran  the  experiment  between  2  September  2009  and  10  September  2009,  collecting  data  for 
1,079  names.  The  total  running  time  for  collecting  this  data  was  only  several  hours,  but,  due  to 
an  unexpected  problem  discussed  in  the  Problems  Encountered  section,  we  were  only  able  to 
run  the  code  for  short  intervals  at  a  time. 

4.3.2  Validation 

We  took  50  of  the  1,079  names  and  manually  compared  our  results  with  those  returned  by  the 
private,  member-only  version  of  Facebook’s  search  page.  We  found  that  41  of  the  50  names 
returned  identical  results,  while  searches  for  the  remaining  9  names  each  resulted  in  one  addi¬ 
tional  match  beyond  that  returned  by  the  public  search  page.  This  discrepancy  can  be  attributed 
to  a  user-controlled  Facebook  privacy  setting  that  enables  a  member  to  disallow  public  search 
results9.  There  is  also  a  Facebook  privacy  setting  controlling  search  results  displayed  to  searches 
using  the  private,  member-only  search  page.  Members  can  choose  from  three  different  options: 
Everyone,  Friends  of  Friends,  and  Only  Friends.  The  default  settings  for  these  options  are  to 
allow  public  search  results  and  to  show  search  results  to  Everyone.  Only  9  people  represented 
by  one  of  the  50  names  changed  their  privacy  options  to  disallow  public  search  results.  Over 
135  profiles  were  returned  by  the  public  search  page  for  these  50  names,  but  only  9  additional 
profiles  were  added  to  the  results  using  the  private  search  page. 

9http : / /www . facebook . com/ sett ings / ?tab=privacy# ! / settings/ ?tab= 
privacyS sect ion= search 
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Number  of  Matches 

Number  of  Names 

Percent 

0 

463 

42.91% 

1 

280 

25.95% 

2 

99 

9.18% 

3 

43 

3.99% 

4 

28 

2.59% 

5 

15 

1.39% 

6 

16 

1.48% 

7 

16 

1.48% 

8 

15 

1.39% 

9 

16 

1.48% 

10  or  more 

88 

8.16% 

Table  4.5:  Distribution  of  exact  Facebook  profile  matches  on  uncommon  names  randomly  cho¬ 
sen  from  DoD41 1. 


We  then  randomly  chose  50  names  from  our  collection  of  1,079  that  resulted  in  exactly  one 
matching  profile.  Of  these  50  names,  13  of  them  could  be  confirmed  as  DoD  members  using 
information  from  their  public  profile  page.  An  additional  9  could  be  confirmed  as  DoD  members 
when  their  profile  page  was  viewed  after  signing  in  as  a  Facebook  member,  bringing  the  total  to 
22  out  of  50  that  could  be  positively  identified  as  belonging  to  the  DoD  member  for  whom  we 
were  searching. 


4.3.3  Results 

We  retrieved  1,079  names  randomly  drawn  from  DoD411  and  searched  for  Facebook  profiles 
matching  those  names  using  Facebook’s  public  search  engine.  42.9%  of  the  names  had  zero 
matching  profiles,  25.95%  had  exactly  one  matching  profile,  and  the  remaining  31.1%  had 
more  than  one  matching  profile.  These  figures  are  only  for  profiles  that  exactly  matched  the 
name.  See  Table  4.5.  We  did  not  count  profiles  as  a  match  if  there  were  slight  differences  in  the 
name,  such  as  “Matt”  for  “Matthew,”  even  though  Facebook  returned  those  as  a  possible  match. 
If  we  count  all  matches  returned  by  Facebook  for  a  particular  name,  then  our  numbers  change 
to  only  32.3%  with  zero  matching  profiles,  22.5%  with  exactly  one  match,  and  the  remaining 
45.1%  with  more  than  one  matching  profile.  Based  on  these  results,  we  estimate  that  at  least 
43%  of  DoD  personnel  do  not  have  accounts  on  Facebook  and  that  between  25%  and  57%  of 
DoD  personnel  do  have  a  Facebook  account. 
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4.3.4  Limitations  and  Problems  Encountered 

The  primary  problem  that  we  encountered  during  this  experiment  was  that  Facebook  imple¬ 
ments  a  CAPTCHA10  system  to  prevent  automated  programs  from  scraping  data  from  the  site. 
This  limited  our  ability  to  completely  automate  our  experiment.  We  modified  our  script  to  pause 
and  notify  us  whenever  a  CAPTCHA  was  encountered,  and  we  would  then  manually  type  the 
necessary  characters  to  solve  the  CAPTCHA  and  allow  our  script  to  continue.  This  limited  the 
times  that  we  could  run  our  script  to  times  that  we  were  available  to  solve  CAPTCHAs,  so  it 
took  a  full  week  to  collect  a  sufficient  amount  of  data. 

A  further  limitation  of  our  method  is  that  we  only  counted  profiles  with  names  exactly  matching 
the  name  we  were  searching  for  as  a  match.  This  means  that  we  were  possibly  under  counting 
the  true  number  of  matches  because  we  ignored  results  in  which  the  name  include  a  modifier 
like  “Jr.”  or  “HI”  or  where  the  first  name  was  shortened  to  a  diminutive,  as  in  “Mike”  for 
“Michael.” 

In  contrast  to  the  Linkedln  experiment  (Section  4.2)  in  which  only  7%  of  the  names  used  in  the 
experiment  had  more  than  one  match,  31.14%  of  the  names  used  in  the  Facebook  experiment 
had  more  than  one  match.  We  account  for  this  difference  by  a  small  change  that  we  made  in 
the  Randomized  Combination  method  (Section  3.2.4)  used  for  this  experiment.  We  neglected 
to  test  whether  the  randomly  selected  name  on  DoD41 1  appeared  on  DoD41 1  more  than  once. 
We  believe  that  this  contrast  with  the  Linkedln  experiment  demonstrates  that  the  Randomized 
Combination  method  described  in  Section  3.2.4  works  well  for  selecting  uncommon  names  and 
that  the  change  made  for  the  Facebook  experiment  led  to  a  less  satisfactory  list  of  uncommon 
names.  Further  experimentation  would  be  required  to  verify  this  conclusion. 

4.3.5  Lessons  Learned  and  Proposed  Improvements 

This  experiment  was  useful  for  more  than  just  the  statistics  that  we  gathered.  We  also  learned 
several  important  lessons,  both  to  improve  our  experiment  and  about  Facebook  in  general.  First, 
we  discovered  that  searches  for  names  using  the  “First  Last”  name  variation  included  the  same 
results  as  those  for  the  “First  M.  Last”  and  “First  Middle  Last”  variations,  making  a  search  for 
the  latter  two  redundant.  We  could  improve  our  experiment  by  searching  only  for  the  “First 
Last”  variation,  then  comparing  the  results  with  all  three  variations.  This  would  make  the 
experiment  more  accurate  as  well  as  eliminating  two-thirds  of  the  queries  to  Facebook. 

“’Completely  Automated  Public  Turing  test  to  tell  Computers  and  Humans  Apart. 
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We  found  that  some  profiles  could  be  identified  as  likely  belonging  to  a  DoD  person  because 
one  or  more  Friend  pictures  shown  on  the  profile  were  in  military  uniform.  This  would  lead  us 
to  conclude  that  we  could  use  Friend  information  to  help  determine  the  likelihood  of  a  particular 
profile  belonging  to  someone  in  DoD.  Most  profiles  show  a  subset  of  up  to  eight  of  the  subject’s 
Friends,  including  both  their  name  and  picture.  We  observed  that  by  refreshing  the  profile  page, 
the  subset  of  Friends  that  is  displayed  changes.  We  believe  that  we  could  trivially  obtain  a  list 
of  all  Friends  of  a  specific  profile  owner  by  continually  refreshing  the  profile  page  until  we  stop 
seeing  new  Friends.  This  method  is  only  necessary  if  we  do  not  sign  into  Facebook.  If  we  sign 
in,  we  are  able  to  see  a  list  of  all  of  a  profile  owner’s  Friends  without  the  need  to  repeatedly 
refresh  the  profile  page.  We  can  use  the  profile  owner’s  list  of  Friends  to  help  determine  if 
the  subject  is  a  DoD  member,  similar  to  work  done  by  Jemigan  and  Mistree,  who  used  Friend 
associations  to  predict  the  sexual  orientation  of  profile  owners  [44] . 

We  were  able  to  see  additional  information  for  most  of  the  profiles  by  signing  in  to  Facebook. 
We  were  surprised  to  find  that  so  much  profile  information  was  effectively  being  shared  with  the 
public,  requiring  only  signing  in  as  a  Facebook  member  to  view  the  information.  Commonly 
included  on  profile  pages  was  information  such  as  spouse’s  name,  fiancee’s  name,  siblings’ 
names,  children’s  names,  education  history,  current  employer  and  current  location  down  to  the 
city  and  state.  Some  profiles  even  allowed  access  to  the  profile  owner’s  “Wall.”  Facebook’s 
privacy  settings  do  allow  restricting  this  information  to  Friends  or  Friends-of-Friends,  but  the 
default  setting  for  most  profile  information  makes  it  visible  to  Everyone.  46  of  the  50  profiles 
that  we  viewed  manually  displayed  some  form  of  personal  information  in  addition  to  the  per¬ 
son’s  name,  ranging  from  only  a  profile  picture  to  all  of  the  information  named  above  and  more. 
When  viewed  without  signing  in  to  Facebook,  six  of  the  50  profiles  showed  a  picture  of  the 
profile  owner  in  military  uniform  and  seven  revealed  the  owner  as  a  “Fan  of”  Facebook  pages 
affiliated  with  DoD  membership  (see  Table  4.6).  When  viewed  after  signing  in  to  Facebook, 
1 1  of  the  50  profiles  revealed  the  owner  as  either  belonging  to  a  network  or  being  employed 
by  a  DoD  organization,  one  revealed  detailed  employment  history  including  USMC  ranks  and 
billets  held  and  operations  the  owner  participated  in,  two  revealed  the  owner’s  current  position 
in  one  of  the  Armed  Forces,  one  displayed  a  description  of  their  current  job  as  being  in  “nuclear 
propulsion,”  and  three  revealed  their  owners  as  “Fans”  of  DoD  related  pages. 

We  further  discovered  that  signed  in  Facebook  members  can  search  for  profiles  by  name  and 
employer,  location,  or  school  using  the  page  at  http  :  //www.  facebook  .  com/search/. 
For  example,  to  find  everyone  who  has  listed  their  employer  as  the  United  States  Navy,  one  can 
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Profile  Field 

Entry 

Network 

United  States  Army 

United  States  Air  Force 

United  States  Navy 

United  States  Coast  Guard 

Air  Force  Academy  Alum  ’06 

Employer 

United  States  Army 

USN 

US  Navy 

USAF 

Position 

15F1P  Aircraft  Electrician 

Pilot 

Apache  crew  chief 

Fan  of 

Wounded  EOD 

3rd  Infantry  Division  Band 

Admiral  Mike  Mullen,  Chairman  of  the  Joint  Chiefs  of  Staff 
PERS-43 

PERS-41:  Surface  Warfare  Officer  Assignments 

Naval  Station  Newport,  Rhode  Island 

Master  Chief  Petty  Officer  of  the  Navy  (MCPONj(SSSW) 
Chief  of  Naval  Operation 

Table  4.6:  Sample  of  observed  Facebook  profile  information  revealing  DoD  association. 


just  search  for  “USN,”  “US  Navy,”  and  “United  States  Navy”  in  the  workplace  field.  Unexpect¬ 
edly,  we  found  that  by  searching  using  this  method,  even  profiles  in  which  the  employer  field 
was  not  directly  viewable  were  returned  in  the  search  results.  The  search  results  were  limited, 
though,  allowing  us  to  see  up  to  500  matches. 

We  could  significantly  increase  the  accuracy  of  our  experiment  by  using  the  signed-in  version 
of  the  Facebook  search  page.  We  stipulate  that  profiles  not  restricted  to  “Friends”  and  “Friends- 
of-Friends”  might  as  well  be  completely  public  because  an  adversary  could  trivially  create  a 
false  Facebook  account  to  gain  access  to  this  information. 


45 


4.4  Determining  Percent  of  DoD  Using  MySpace 

Our  purpose  with  this  experiment  was  to  determine  the  percentage  of  DoD  personnel  with 
MySpace  accounts.  As  with  the  previous  two  experiments,  we  assume  that  individuals  with 
uncommon  names  have  the  same  likelihood  of  having  a  MySpace  account  as  individuals  with 
common  names.  We  use  uncommon  names  as  a  way  to  sample  the  entire  population  of  DoD 
personnel  because  we  are  able  to  determine  with  greater  certainty  whether  an  individual  with 
an  uncommon  name  has  a  MySpace  account  because  we  can  more  confidently  identify  them. 


4.4.1  Experimental  Setup 

Our  first  step  in  performing  this  experiment  was  to  determine  the  method  with  which  we  would 
search  for  profiles  on  MySpace.  We  considered  using  either  the  public  search  engine  offered  by 
MySpace11  or  using  Google,  which  indexes  MySpace  member  profiles.  As  with  our  previous 
experiments  on  Linkedln  and  Facebook,  we  tested  and  compared  searches  on  variations  of 
several  different  names  using  both  Google  and  MySpace.  We  found  that  the  results  returned  by 
MySpace  were  more  complete  than  those  returned  by  Google,  so  we  used  the  MySpace  search 
engine  for  this  experiment.  The  MySpace  public  search  engine  does  not  require  a  user  to  be 
authenticated  with  MySpace. 

The  next  step  was  to  determine  the  optimal  parameters  to  the  MySpace  search  engine  to  pro¬ 
duce  the  desired  results.  In  order  for  our  experiment  to  be  accurate,  we  needed  to  find  the  best 
combination  of  parameters  that  would  return  only  matches  for  the  name  for  which  we  were 
searching.  The  initial  options  available  for  the  search  engine  are  to  search  by  Name,  Display 
Name,  Email,  or  all  three  as  shown  in  Figure  4.4.  As  an  example,  we  searched  for  the  name 
“John  R  Smith”  using  the  default  setting  of  all  three  fields.  We  choose  this  name  because  we 
thought  it  was  likely  to  result  in  many  matches.  The  search  resulted  in  18  matches.  We  then 
searched  for  “John  R  Smith”  again,  but  selected  the  option  to  search  by  Name  only,  which  re¬ 
sulted  in  14  profiles.  Comparing  both  sets  of  profiles,  we  confirmed  that  all  14  profiles  returned 
using  the  Name  search  were  also  in  the  set  of  profiles  returned  using  the  default  of  searching 
all  three  fields.  The  four  additional  profiles  returned  using  the  default  settings  all  had  a  display 
name  exactly  matching  “John  R  Smith.”  Based  on  this  and  other  similar  test  queries,  we  deter- 


11  http : / / search service . my space . com/ index . cfm?fuseaction=sitesearch . 
f riendf inder 
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myspace 


Home  Browse  People  Find  Friends  Local  Music  Video  Games  More  ▼ 


Find  Friends  on  MySpace 

MySpace  is  the  place  to  connect  with  friends.  These  tools  will  help  you  find  people  you  know  and 


Search  for  People 


Search  By: 

All  Name  Fields 


All  Name  Fields 


Name 

Display  Name 
Email 


Name,  display  name  or  email 

...or  browse  people  by  age,  location,  etc.  to 


Figure  4.4:  Myspace  public  search  page. 

mined  that  the  most  complete  results  were  returned  using  the  default  search  setting.  The  query 
URL  for  this  default  search  is: 


http : / /searchservice .myspace . com/index . cfm?fuseaction=sitesearch . 
result s&qry= john%2  0r%2  0 smiths type=people&srchBy=All 


In  addition  to  the  initial  options,  the  results  page  offers  more  refined  filtering  options  as  shown 
in  Figure  4.5.  We  discovered  that  these  filters  can  also  be  passed  to  the  search  engine  on  the 
initial  query  by  appending  them  to  the  URL  used  for  the  query.  These  additional  options  filter 
the  search  results  by  age,  location,  gender,  and  whether  the  profile  includes  a  photo.  An  example 
query  URL  with  a  filter  for  profiles  with  a  location  of  “United  States”  and  a  minimum  age  of 
18  looks  like: 


http : / /searchservice .myspace . com/index . cfm?fuseaction=sitesearch . 
result s&qry= john%2  0r%2  0 smiths type=people  &  srchBy=All&loc=United% 
20StatesSminAge=18 
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Filter  Results 


Search  By: 


All  Name  Fields  - 


Gender: 


©  Male  ©  Female  ®  Both 


Age: 

City,  State,  Zip,  or  Country: 


Distance: 

□  Only  show  users  who  have  photos 

□  Show  names  and  photos  only 


25  Miles  - 


Figure  4.5:  Myspace  public  search  page,  additional  options. 


We  then  wrote  a  Python  script  (see  Appendix  6.9  and  6.10)  to  automate  our  experiment  with 
these  steps: 

1.  Retrieve  an  uncommon  name  from  DoD41 1  using  the  Randomized  Combination  method. 

2.  For  each  uncommon  name  retrieved  from  DoD41 1,  send  three  separate  queries  to  MyS- 
pace  using  each  of  the  three  name  variations  in  Table  3.1. 

3.  Record  the  number  of  matches  for  each  of  the  three  name  variations. 

We  ran  the  experiment  on  January  19,  2010  and  recorded  results  for  1,183  uncommon  names  in 
less  than  four  hours. 

4.4.2  Validation 

In  order  to  validate  this  experiment,  we  used  the  MySpace  search  page  to  manually  search  for 
50  of  the  names  with  one  match.  All  50  of  the  names  correctly  returned  one  matching  profile. 
36  of  the  50  profiles  were  “public”  (profile  information  is  viewable  by  any  Web  user)  while  the 
remaining  14  were  “private”  (certain  profile  information  is  viewable  only  by  the  user’s  approved 
list  of  “Friends”).  16  of  the  50  profiles  explicitly  stated  the  person’s  name  exactly  as  searched 
for.  None  of  the  remaining  34  profiles  gave  any  indication  that  the  profile  owner’s  name  did  or 
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United  States  Navy 
Occupation:  U.S.  Army 
jarhead 

Marines  who  are  serving  in  Afganistan 
Occupation:  USAF 

Career  Assistance  Advisor  in  the  Air  Force 

ANNAPOLIS 

TRAVIS  AFB 

Occupation:  Marine 

Kadena  AB 

stationed  on  ... 


Table  4.7:  Sample  of  My  Space  profile  information  implying  membership  in  DoD. 

did  not  match  the  name  searched.  Based  on  this  result,  we  believe  that  MySpace  returns  only 
profiles  for  which  the  name  searched  for  matches  the  profile  owner’s  name,  even  in  the  cases 
where  the  profile  owner’s  name  is  not  explicitly  shown  on  the  profile  page.  14  of  the  50  profiles 
contained  information  explicitly  confirming  the  person  as  a  DoD  member.  Table  4.7  shows  a 
sample  of  words  found  on  profile  pages  implying  affiliation  with  the  DoD. 

4.4.3  Results 

We  used  Randomized  Combination  (see  3.2.4)  to  generate  1,944  uncommon  names,  of  which 
1,183  appeared  only  once  on  DoD41 1.  Of  the  1,183  uncommon  names  retrieved  from  DoD41 1, 
564  (47.68%)  resulted  in  at  least  one  match  on  MySpace  and  259  (21.89%)  had  only  one  match. 
See  Table  4.8.  Most  of  these  matches  were  found  using  the  “First  Last”  name  variation  (See 
Table  3.1).  There  were  two  names  with  exactly  one  match  using  the  “First  M.  Last”  variation 
and  two  names  with  matches  using  the  “First  Middle  Last”  variation,  one  with  only  one  match 
and  one  with  two  matches.  Based  on  these  results,  we  estimate  that  between  22%  and  48%  of 
DoD  personnel  have  MySpace  accounts.  We  believe  that  at  least  52%  of  DoD  personnel  do  not 
have  MySpace  accounts  because  there  were  no  MySpace  profiles  matching  their  names. 

In  comparison  with  the  Linkedln  and  Facebook  experiments  in  which  7%  and  31.14%  of  the 
sample  names  resulted  in  more  than  one  match,  25.79%  of  the  names  in  this  experiment  resulted 
in  more  than  one  match.  One  difference  from  the  Facebook  and  Linkedln  experiments  is  that 
instead  of  counting  only  exact  matches,  we  count  all  matches  returned  by  the  MySpace  search 
engine.  The  reason  for  this  is  that  Display  Names  are  not  necessarily  the  same  as  the  user’s 
real  name  as  is  the  case  with  Facebook  and  Linkedln,  so  we  do  not  have  a  way  of  determining 
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Number  of  Matches 

Number  of  Names 

Percent 

0 

619 

52.32% 

1 

259 

21.89% 

2 

95 

8.03% 

3 

63 

5.33% 

4 

32 

2.70% 

5 

30 

2.54% 

6 

19 

1.61% 

7 

11 

0.93% 

8 

7 

0.59% 

9 

11 

0.93% 

10 

6 

0.51% 

>10 

31 

2.62% 

Table  4.8:  Distribution  of  MySpace  profile  matches  on  uncommon  names. 


whether  a  given  profile  is  an  exact  match.  We  believe  that  this  results  in  a  higher  number  of 
matches  than  would  otherwise  be  the  case,  which  explains  the  high  number  of  names  with  more 
than  one  match. 


4.4.4  Lessons  Learned 

We  were  surprised  to  discover  that  even  profiles  that  are  “private”  still  display  the  profile  name, 
the  user’s  picture,  gender,  age,  location  (state, country),  and  date  of  last  login.  Profiles  also 
signal  whether  the  user  is  currently  signed  in.  We  also  found  that  some  posts  by  DoD  members 
or  their  friends  contained  information  related  to  deployments  and  even  identified  specific  units 
(See  Table  4.9).  When  combined  with  the  profile  owner’s  location,  friends,  and  the  date  of  the 
post,  these  snippets  convey  even  more  specific  information. 


4.4.5  Proposed  Improvements 

One  improvement  to  this  experiment  would  be  to  make  more  use  of  the  filters  included  with 
the  MySpace  search  engine  to  increase  the  likelihood  of  finding  names  within  the  target  popu¬ 
lation.  For  example,  if  we  are  searching  for  DoD  members,  we  could  filter  the  results  by  age 
(18  years  or  older)  and  location  (United  States).  We  could  also  parse  each  profile  page  for  pro¬ 
file  information  or  terms  that  would  increase  the  likelihood  that  the  profile  belongs  to  a  DoD 
member. 
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“i  leave  for  afghanistan  in  march” 

“i  talked  to  a  couple  ppl  and  turns  out  all  of  2nd  BCT  is  headed 
over  to  afganistan  next  august” 

“i’m  in  iraq  in  like  3  1/2  weeks” 

“its  official  i  leave  to  Afghanistan  on  Monday  April  5” 

“did  you  leave  for  afghanistan  yet??” 

“Leave  to  Afghanistan  tommorrow.” 

“hey  guys  its  [name  deleted]  this  is  what  my  platoon  has  been  doin 
in  afghanistan  for  the  past  9  months  tell  [name  deleted]  ill  be  home 
early  due  to  tramatic  brain  injury  from  getting  blown  up  13  times 
in  one  tour” 

“im  gonna  be  on  mid  tour  leave  from  afghanistan  in  feb” 

“but  i  leave  for  Afghanistan  in  november” 

“I  leave  for  afghanistan  this  month.” 


Table  4.9:  Sample  of  MySpace  posts  containing  information  identifying  specific  units  or  de¬ 
ployment  schedules.  Posts  viewed  on  April  23,  2010. 


Site 

DoD  Personnel  with  Accounts 

DoD  Personnel  without  Accounts 

Linkedln 

11%  -  18% 

>81% 

Facebook 

25%  -  57% 

>43% 

MySpace 

22%  -  48% 

>52% 

Table  4.10:  Summary  of  experimental  findings  on  the  percentage  of  DoD  personnel  with  ac¬ 
counts  on  Linkedln,  Facebook,  and  MySpace. 


4.5  Results  Summary 

We  believe  that  we  have  answered  our  original  research  questions  after  performing  our  exper¬ 
iments.  We  were  able  to  use  statistical  sampling  to  estimate  the  percentage  of  DoD  personnel 
with  accounts  on  three  popular  social  network  sites.  We  were  also  able  to  estimate  the  percent¬ 
age  of  DoD  personnel  without  accounts  on  those  sites.  A  summary  of  these  results  is  shown  in 
Table  4.10. 
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CHAPTER  5: 

Other  Discoveries  and  Future  Work 


5.1  Other  Discoveries 

Through  the  course  of  our  experiments  with  Facebook,  Linkedln,  and  MySpace,  we  made  sev¬ 
eral  other  discoveries  unrelated  to  the  experiments,  but  of  themselves  interesting. 

•  It  is  easy  to  find  profiles  with  DoD  affiliation. 

The  MySpace  search  page  has  a  feature  allowing  unauthenticated  users  to  search  all  of 
MySpace.  We  tested  this  search  feature  and  found  that  it  returns  blog  posts  and  posts  on 
personal  profile  pages.  We  discovered  that  it  even  returns  posts  made  by  users  who  have  a 
private  profile,  but  who  post  something  on  the  non-private  profile  page  of  another  person. 
This  could  be  useful  for  many  purposes,  but  one  which  we  tested  was  searching  for  terms 
such  as  “leave  for  afghanistan”,  which  returned  26,500  results,  many  of  which  were  posts 
including  specific  dates  that  an  individual  was  leaving  for  Afghanistan  (See  Table  4.9). 
Similar  search  phrases  could  be  used  by  an  adversary  to  find  the  pages  of  DoD  members 
or  to  gather  intelligence  on  a  specific  topic  related  to  DoD  operations. 

As  discussed  in  1.3.1,  Facebook  will  soon  be  providing  the  ability  to  search  all  public 
posts  through  the  Facebook  Platform  API.  Using  this  new  feature  of  the  API,  an  ad¬ 
versary  could  conduct  searches  similar  to  those  allowed  by  MySpace  to  find  the  pages 
of  DoD  members  and  to  gather  intelligence  on  a  DoD  related  topic.  Facebook  already 
provides  authenticated  members  the  ability  to  search  status  update  and  wall  posts  using 
either  “Posts  by  Friends”  or  “Posts  by  Everyone”  (http  :  /  /www .  facebook  .  com/ 
search/).  We  used  the  “Posts  by  Everyone”  option  to  search  for  “afghanistan.”  Table 
5.1  lists  a  small  sample  of  the  posts  that  were  returned.  These  posts  were  all  made  within 
60  minutes  of  our  search.  By  employing  similar  searches,  many  DoD  members,  along 
with  their  family  and  friends,  can  be  easily  found. 

•  Facebook’s  haphazard  changes  to  its  privacy  policy  compromises  the  security  of  DoD 
users. 

We  also  found  a  specific  example  of  Facebook  changing  the  privacy  setting  of  users  from 
a  more  restrictive  to  a  less  restrictive  setting.  We  first  set  our  personal  profile  privacy 
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“He  deploys  to  Afghanistan  in  a  few  days.” 

“Just  found  out  I’m  deploying  to  Afghanistan  soon,  go  to  training  in  ft  dix  NJ  on 
May  15th...  :(  pretty  upset.” 

“[name  omitted]  leaves  for  Afghanistan  in  a  couple  days” 

“Pray  for  my  hubby  in  afghanistan- wishing  he  was  here  to  celebrate  with  us!” 
“they  say  we  are  leaving  from  kuwait  to  afghanistan  the  25th” 

“4  more  days  till  afghanistan” 

“Afghanistan  im  on  my  way” 

“our  Daddy  made  it  safe  to  afghanistan  he’s  doing  great.  Were  so  proud  of 
you  CPL  [Name  omitted].” 

“Delayed  again  waiting  on  flights  to  afghanistan  and  it  looks  like  we  get  to  spend 
another  weekend  at  home!” 

“Well,  it’s  official.  May  7  I  fly  from  New  Orleans  to  a  MIL  travel  portal,  then 
fly  to  Doha,  Qatar,  then  jump  to  my  duty  station  for  the  next  90  days  in 
Afghanistan.  Wish  me  luck.” 

“going  to  afghanistan  soon” 

“I’m  deploying  to  afghanistan  Tuesday” 

“its  my  babes  last  day  here  before  heading  back  to  afghanistan” 


Table  5.1:  Sample  of  Facebook  status  updates  found  using  the  search  term  “Afghanistan.”  All 
of  these  posts  were  made  within  60  minutes  of  our  search.  Search  done  on  April  11,  2010. 


settings  to  the  restrictive  setting  of  allowing  profile  information  to  be  viewed  by  “Only 
Friends.”  We  then  joined  the  Naval  Postgraduate  School  network.  After  joining  that 
network,  our  privacy  settings  were  changed  to  a  less  restrictive  setting,  allowing  profile 
information  to  be  viewed  by  “Friends  and  Networks.”  This  less  restrictive  setting  would 
allow  our  profile  and  posts  to  be  viewed  by  members  of  any  networks  to  which  we  belong. 
Facebook  did  provide  a  notice  that  our  privacy  settings  may  have  changed  upon  joining 
the  network  (see  Figure  5.1),  but  we  were  not  specifically  informed  that  the  privacy  set¬ 
tings  would  allow  all  of  our  networks  to  view  our  private  information.  Even  if  we  were 
to  immediately  change  the  settings  back  to  “Only  Friends,”  our  profile  information  was 
made  less  private  than  we  wished  it  to  be  for  that  short  period  of  time  (See  Figure  5.2  and 
5.3). 
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My  Account 


Settinas 

Networks 

Notifications 

Mobile 

Language 

Payments 

Facebook  Ads 

You  are  now  affiliated  with  this  network.  Your  profile  privacy  settings  may  have  changed. 

Join  a  Network 
Enter  a  workplace  or  school. 

Network  name: 


Join  Network 


You  are  in  one  network. 


Naval  Postgraduate  School 

Edit  Info 

Monterey,  CA 

Leave  Network 

Status:  Grad  Student 

Year:  2010 

You  have  no  friends  at  Naval  Postgraduate  School. 

There  are  357  people  in  the  Naval  Postgraduate 

School  network. 

Figure  5.1:  The  only  notification  provided  by  Facebook  that  our  privacy  settings  changed  after 
joining  a  network. 

5.2  Future  Work 

This  thesis  has  introduced  the  idea  of  using  uncommon  names  to  identify  the  profiles  of  select 
individuals,  specifically  DoD  members,  on  social  network  sites.  There  are  many  ways  in  which 
this  research  could  be  expanded. 

5.2.1  Uncommon  Names 

We  identified  three  different  methods  for  randomly  selecting  uncommon  names  from  a  directory 
(Section  3.2.3).  Further  experimentation  is  necessary  to  determine  which  of  these  three  methods 
is  most  effective.  Experiments  may  include  the  following: 

1.  Create  one  list  of  uncommon  names  using  each  method,  then  for  each  list  compare  the 
percentage  of  names  that  result  in  more  than  one  match  over  several  different  social  net¬ 
work  sites. 

2.  For  each  list  of  uncommon  names,  calculate  an  estimated  frequency  of  occurrence  of  that 
name  based  on  the  frequencies  given  in  the  1990  Census  Bureau  name  files  for  first  and 
last  names  (Section  4.1.1). 

There  is  also  research  to  do  in  exploring  new  methods  for  selecting  uncommon  names.  One 
idea  involves  searching  for  extremely  uncommon  or  unique  first  names  in  a  directory  as  a  basis 
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for  finding  uncommon  full  names.  Another  idea  is  to  use  the  frequencies  given  for  first  and  last 
names  in  the  1990  Census  data  to  generate  names  that  are  likely  to  be  uncommon.  Although  all 
of  the  techniques  discussed  until  now  have  been  restricted  to  names  using  the  ASCII  or  Latin  1 
character  sets,  these  techniques  can  clearly  be  expanded  to  Unicode  names  such  as  those  in 
Arabic,  Chinese,  Japanese,  or  other  non-Roman  character  sets.  Finally,  future  research  could 
explore  the  use  of  Poisson  processes  to  model  the  occurrence  rate  and  variance  for  a  particular 
name. 

5.2.2  Compiling  an  Online  Profile 

More  research  could  be  done  with  combining  information  from  multiple  sources  to  build  a 
comprehensive  profile  of  an  individual.  We  used  the  DoD41 1  directory  combined  with  each  of 
three  popular  social  network  sites,  but  we  did  not  attempt  to  combine  information  from  all  three 
sites  collectively.  Future  research  could  focus  on  searching  for  information  about  an  individual 
on  multiple  sites  and  combining  the  results  to  form  a  more  complete  profile  of  the  person.  A 
related  area  for  research  would  be  to  determine  whether  the  social  network  profiles  of  DoD 
members  can  be  accurately  identified  based  solely  on  their  social  network  contacts,  similar  to 
research  done  by  Jemigan  and  Mistree  that  predicted  sexual  orientation  based  on  social  network 
contacts  [44]. 

Another  direction  for  future  work  would  be  to  focus  on  finding  a  better  way  to  determine  if  the 
profile  matching  a  person’s  name  belongs  to  the  person-of-interest.  We  only  used  the  character¬ 
istic  that  the  name  of  a  person-of-interest  matched  the  name  listed  on  a  profile  and  that  the  name 
was  uncommon.  Other  methods  could  be  used  either  in  combination  with  or  separate  from  the 
uncommon  name  matching  method.  Methods  to  identify  a  person  who  does  not  necessarily 
have  an  uncommon  name  would  include: 


1.  Use  a  person’s  email  address  as  a  common  identifier  between  two  or  more  sources. 

2.  Extract  clues  from  a  person’s  email  address  that  would  help  identify  them.  These  might 
include  age,  birth  date,  employer,  and  etc,  which  are  commonly  listed  on  a  person’s  social 
network  profile  page  and  are  commonly  used  as  portions  of  an  email  address. 

3.  Use  a  person’s  list  of  contacts  or  “Friends”  to  identify  them  on  other  sites.  This  could 
include  Web  searches  for  the  name  of  the  person-of-interest  combined  with  each  of  their 
contacts’  names,  as  proposed  in  [41]. 
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4.  Correlating  social  network  graphs  from  multiple  sites. 


5.  Use  of  other  identifying  information  such  as  location,  schools,  etc. 


In  the  special  case  of  using  DoD411  as  the  directory  (or  searching  for  DoD  members),  these 
additional  methods  could  be  used: 


1.  Use  the  email  address  stored  on  DoD41 1  to  determine  if  the  person  is  a  contractor,  civilian 
employee,  or  active  duty  warfighter,  which  can  often  be  determined  from  the  domain  of 
the  email  address.  The  ”ou”  field  returned  by  a  DoD41 1  LDAP  query  also  provides  clues 
to  the  DoD  organization  of  the  person  using  a  specific  acronym  such  as  USAF,  USN,  or 
USMC. 

2.  Parse  the  social  network  profiles  of  potential  matches  to  extract  clues  indicating  DoD 
affiliation,  such  as  those  listed  in  Table  4.3. 

3.  Use  the  email  address  stored  on  DoD411  to  determine  the  location  of  the  person.  Some 
email  domains  on  DoD41 1  are  base-  or  location-specific.  Compare  that  location  with  the 
location  listed  in  potential  matching  profiles. 

4.  Obtain  or  create  a  list  of  the  most  common  locations  for  DoD  personnel  assignments,  such 
as  a  list  of  the  locations  of  all  DoD  bases  and  facilities.  Profiles  which  specify  a  location 
matching  one  of  the  locations  on  the  list  would  have  a  greater  likelihood  of  belonging  to 
the  DoD  person-of-interest. 


5.2.3  Active  Attacks 

This  thesis  did  not  investigate  active  attacks  against  a  person-of-interest  using  social  network 
sites.  The  purpose  of  these  attacks  could  be  to  gain  access  to  the  target’s  personal  information, 
pass  false  information  to  the  target,  or  pass  false  information  to  the  target’s  contacts.  There  are 
many  possibilities  for  future  work  in  this  area,  including  ways  to  implement  and  defend  against 
attacks  and  research  into  the  effectiveness  of  specific  attacks.  Some  specific  attacks  include: 
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1.  Posing  as  a  “Friend”  of  the  target.  This  could  be  done  in  several  ways,  such  as  cloning 
the  account  of  one  or  more  of  the  target’s  contacts  or  creating  a  profile  using  the  personal 
information  of  a  known  acquaintance  of  the  target  who  does  not  yet  have  an  account  on 
the  specific  social  network  site,  then  sending  a  “Friend”  request  to  the  target.  An  attacker 
could  also  gain  access  to  the  account  of  someone  who  is  already  a  “Friend”  of  the  target 
[18]. 

2.  Sending  “Friend”  requests  to  the  target  from  the  account  of  a  person  who  is  not  an  ac¬ 
quaintance  of  the  target.  This  attack  relies  on  the  hope  that  the  target  will  accept  a  request 
from  someone  they  don’t  know.  The  sending  account  could  be  the  attacker’s  personal  ac¬ 
count,  the  forged  account  of  a  celebrity,  or  the  account  of  an  imaginary  person  specially 
crafted  to  use  for  the  attack  [37]  [38]. 

3.  Writing  an  application  for  the  target  to  use.  Some  social  network  sites  provide  APIs 
allowing  developers  to  create  applications  for  site  members  to  use.  Facebook  Platforms 
allows  developers  to  write  applications  that  have  access  to  users’  personal  profile  data  and 
that  of  their  contacts  [39].  An  adversary  could  write  an  innocuous-looking  application 
and  get  the  target  or  targets  to  enable  it.  The  application  would  then  gain  access  to  the 
personal  profile  information  of  the  target  and  their  contacts. 

4.  Gaining  access  to  the  account  of  an  application  developer,  thereby  allowing  the  attacker 
access  to  the  applications  written  by  the  developer  and  potentially  to  the  personal  profile 
information  of  users  who  have  installed  the  application. 

5.  Using  clues  found  on  social  network  sites  to  craft  personalized  emails  to  the  target  or  the 
target’s  contacts.  Prior  research  has  demonstrated  that  the  social  context  of  a  phishing 
message  can  lead  targets  to  place  a  higher  trust  in  the  message  and  lower  their  suspicions 
[33].  In  the  context  of  the  DoD,  this  could  lead  to  targeted  phishing  attempts  that  take 
advantage  of  the  target’s  social  network  to  make  it  appear  that  the  attacker  is  a  friend  of 
the  target.  An  attack  of  this  form  could  be  used  to  solicit  information  from  the  target  or 
gain  the  target’s  trust. 

5.2.4  Policies  and  Education 

More  research  needs  to  be  done  with  respect  to  both  civilian  and  military  policies  and  privacy 
laws  concerning  social  network  use.  Questions  that  will  need  to  be  addressed  by  these  policies 
and  laws  include: 
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1.  What  type  of  training  and  education  is  needed  to  ensure  that  users  are  aware  of  issues 
surrounding  the  use  of  social  network  sites? 


2.  How  to  maintain  institutional  awareness  of  the  privacy  policies  and  relevant  privacy  set¬ 
tings  of  social  network  sites? 

3.  Should  there  be  recommended  privacy  settings  and/or  standards  for  social  networking 
sites? 

4.  How  to  educate  users  on  the  recommended  privacy  settings? 

5.  Should  specific  social  networking  sites  be  recommended  or  discouraged? 

6.  Who  should  maintain  awareness  of  the  relevant  privacy  policies  and  settings  for  the  vari¬ 
ous  social  network  sites  and  monitor  them  for  changes?  Who  formulates  the  set  of  privacy 
settings  recommended  for  DoD  users? 

7.  Should  the  personal  online  activities  of  those  in  a  position  to  reveal  proprietary  or  classi¬ 
fied  information  be  monitored? 

This  is  only  a  short  list  of  the  issues  surrounding  the  use  of  social  network  sites.  As  the  use  of 
such  sites  continues  to  become  more  prevalent,  employers  and  government  agencies  will  need 
to  formulate  policies  and  procedures  to  address  questions  of  this  nature. 

5.2.5  Other 

More  work  could  be  done  with  the  search  tools  that  social  network  sites  provide  to  determine 
the  extent  to  which  posts  by  various  individuals  can  be  correlated  to  gain  information  about 
DoD  operations.  The  nature  of  social  networks  is  a  graph,  and  the  profile  pages  of  individuals 
typically  provide  links  to  closely  related  nodes  in  the  graph.  Research  should  be  done  to  deter¬ 
mine  if  this  can  be  exploited  by  an  adversary  to  build  a  more  comprehensive  picture  of  a  DoD 
unit  and  its  activities. 
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(b)  Privacy  Settings,  Profile  Information  -  After  joining  a  network. 


Figure  5.2:  Facebook  privacy  settings  for  profile  information  before  and  after  joining  a  net¬ 
work.  The  settings  before  joining  a  network  restricted  visibility  of  profile  information  to  “Only 
Friends.”  After  joining  a  network,  the  settings  were  automatically  changed  to  the  less  restrictive 
visibility  “Friends  and  Networks,”  allowing  anyone  belonging  to  any  network  in  common  with 
us  to  see  our  profile  information. 
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Privacy  Settings  ►  Contact  Information 


|  <  Back  to  Privacy 

IM  Screen  Name 


Mobie  Phone 


Other  Phone 

Current  Address 

Website 

Hometown 


Add  me  as  a  friend 
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(a)  Privacy  Settings,  Contact  Information  -  Before  joining  a  network. 


(b)  Privacy  Settings,  Contact  Information  -  After  joining  a  network. 

Figure  5.3:  Facebook  privacy  settings  for  contact  information  before  and  after  joining  a  net¬ 
work.  The  settings  before  joining  a  network  restricted  visibility  of  contact  information  to  “Only 
Friends. ’’After  joining  a  network,  the  settings  were  automatically  changed  to  a  less  restrictive 
visibility  “Friends  and  Networks”  allowing  anyone  belonging  to  any  network  in  common  with 
us  to  see  our  contact  information,  including  current  address  and  phone  number. 


61 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


62 


CHAPTER  6: 
Conclusions 


6.1  Conclusions 

We  began  by  presenting  a  history  of  social  networking  using  computer  networks  and  showed 
how  today’s  social  network  sites  encourage  the  use  of  real  names  and  identities.  We  then  pre¬ 
sented  evidence  that  DoD  members  and  their  families  are  increasingly  at  risk  as  more  and  more 
personal  information  is  becoming  available  over  the  Internet,  and  specifically  through  social  net¬ 
work  sites.  We  proposed  an  original  technique  for  finding  the  social  network  profiles  of  DoD 
members,  then  demonstrated  the  ability  to  automatically  identify  the  social  network  profiles  of 
DoD  members  who  have  uncommon  names.  We  used  this  technique  and  statistical  sampling 
to  determine  the  percentage  of  all  DoD  members  with  accounts  on  Facebook,  Linkedln,  and 
MySpace.  In  the  process  of  performing  our  experiments,  we  discovered  methods  to  improve 
our  original  technique,  as  well  as  new  methods  for  finding  the  social  network  profiles  of  DoD 
members.  We  also  provided  examples  of  some  of  the  privacy  shortcomings  of  social  network 
sites,  specifically  Facebook. 

Based  on  our  experiments,  we  believe  that  DoD  members  and  their  families  are  at  risk  from 
information  that  an  adversary  can  find  online.  Our  research  has  confirmed  the  widespread  use 
of  social  network  sites  by  DoD  members.  We  have  also  presented  the  results  of  research  done 
by  others  that  has  shown  a  widespread  ignorance  by  users  of  the  extent  to  which  their  personal 
profile  information  is  being  shared  with  strangers  and  their  lack  of  understanding  about  how 
to  use  the  privacy  settings  on  social  network  sites  to  control  who  has  access  to  their  personal 
information. 

The  recent  announcement  by  Facebook  that  developers  will  now  be  able  to  search  all  public 
status  updates  also  poses  a  possible  risk  to  the  DoD  by  making  it  easy  for  an  adversary  to 
search,  aggregate,  and  correlate  postings  for  information  related  to  deployments,  training,  and 
operations. 

6.2  Recommendations 

We  believe  that  there  is  a  pressing  need  to  educate  DoD  members  about  the  implications  of  what 
they  share  online.  Most  of  the  information  that  an  adversary  would  be  able  to  discover  could 
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be  suppressed  by  the  profile  owners  by  making  their  privacy  settings  more  restrictive.  Because 
of  the  frequency  with  which  social  network  sites  seem  so  be  adding  new  features  and  changing 
the  way  their  privacy  settings  work,  there  may  also  be  a  need  for  an  organization-level  activity 
that  will  monitor  the  most  popular  social  network  sites  for  privacy  changes  and  privacy  holes 
and  provide  recommended  privacy  settings  for  DoD  members. 
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Appendix:  Code  Listings 


Generate  Random  Names  Using  Census  Lists 

Listing  6.1:  Generates  random  names  using  the  1990  Census  name  lists. 

# 

#  filename  :  genNames.py 

# 

#  Description:  Generates  a  random  name  using  name  files  from  the  1990  U.S. 

#  Census 

# 

#  Usage:  The  files  "firstnames”  and  ” lastnames”  must  be  in  the  current 

#  folder.  These  files  can  be  found  at 

#  http  :  / /www .  census  .  gov  /  genealogy  /  names  /  n  ame  s  _fi  les  .  lit  ml 

# 

#  First,  call  i  ni  t  i  a  l  i  ze  N  ame  s  ( )  to  read  in  the  name  files.  Then 

#  call  getNanies(),  which  will  return  a  string  of  the  form 

#  ’’firstname  lastname”  where  the  firstname  and  lastname  are 

#  independently  randomly  chosen  from  the  census  bureau  name  lists. 

# 

#  Author:  Kenneth  N.  Phillips  ,  September  2009 

# 

import  sys  ,  os  ,  random 

global  firstnames  ,  lastnames 
firstnames  =  set() 
lastnames  =  set  () 

#  Read  name  files  and  store  in  sets 
def  initializeNames  ()  : 

global  firstnames  ,  lastnames 
names  =  os  .  popen  ( ”  cat  firstnames”); 
for  name  in  names : 

firstnames  .add( name ) 
names  =  os  .  popen  (”  cat  lastnames”); 
for  name  in  names: 

lastnames  .  add  (  name) 

#  Return  a  full  name  that  is  the  concatonation  of  a  random  first  name  and 

#  a  random  last  name 
def  getName  ()  : 

fname  =  random  .  sample  (  firstnames  ,1) 
lname  =  random  .  sample  ( lastnames  ,1) 

fullname  =  fname  [0].  split  ()  [0]  +  ”  ”  +  lname  [0].  split  ()  [0] 
return  fullname  .  lower  () 

#  Return  a  random  name  with  a  full  last  name  and  first  initial. 
def  getName2  ()  : 
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fname  =  random  .  sample  (  firstnames  ,1)[0] 
lname  =  random  .  sample  ( lastnames  ,1) 

fullname  =  fname  [0].  split  ()  [0]  +  ”  ”  +  lname  [0],  split  ()  [0] 
return  fullname  .  lower  () 


if  __name__==”  __main__” : 

initializeNames  () 
print  getName() 

else  : 

print  ’’Initializing  names... 
initializeNames  () 
print  ”  I  n  i  t  i  a  1  i  z  e  d  .  \  n” 
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Using  LDAP  to  Access  DoD411 

Listing  6.2:  Uses  LDAP  to  search  for  a  name  on  DoD41 1. 


# 

#  filename  :  dod4 1 1  search  .  py 

# 

#  Description:  Searches  the  DoD411  LDAP  server  (dod411.gds.disa.mil)  for 

#  a  specified  name.  Returns  the  name  of  the  first  match 

#  found . 

# 

#  Usage:  python  dod4 1 1  search  . py  "John  Doe” 

#  python  dod4 1 1  search  .  py  ” John  Doe”  100 

# 

# 

#  Code  based  on  http://www.linuxjournal.com/article/6988  and  code  from 

#  Simson  Garfinkel . 

# 

#  Author:  K.N.  Phillips  ,  September  2009 


import  ldap  ,  ldap  .  async  ,  os  ,  sys 


debug  =  False 


#  Code  based  on  http://www.linuxjournal.com/article/6988 

#  Takes  a  string  of  the  form  "firstname  lastname”  and  returns 

#  the  first  count  matches  from  the  DoD411  LDAP  server  in  the  form  of 

#  a  list. 

def  dod4 1 1  Search  All  (  search  _term  ,  count  =  100,  f  i  1 1  e  r  =  ’  cn=*%s  *  ’ )  : 
server  =  ”dod41 1 . gds . disa .  mil” 
uri  =  ’’ldap://”  +  server 
search_term  =  se arc h .term  .  s  p  1  i  t  () 
if  len(search_term)  >  2: 

search_term  =  ( search_term  [2]  +  ’*’  +  search_term  [0]  +  ’*’  + 
search  Jerm  [1])  .  lower!) 
elif  lent  search  Jerm)  >  1: 

search_term  =  (  search  _term  [  1  ]  +  ’*’  +  search_term  [0] ).  lower  () 
else  : 

search_term  =  search_term  [0] .  lower  () 

#  distinguished  name  from  which  to  start  search 

#  base-dn  =  ’  ou=PKI ,  ou=DoD ,  o=U .  S .  Government ,  c=us  ’ 
base_dn  =  ’o=U.S.  Government  ,  c  =  us  ’ 

#  scope  of  search 

scope  =  Id ap  .  SCOPE_SUBTREE 

#  which  fields  to  search 
filter  =  filter  %  search  .term 

#  which  fields  to  retrieve 
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#  retrieve -attrib  =  [  ’  cn  ’  sn  ’givenName  ’ ,  ’  middleName  ’] 

retrieve.attrib  =  [’*’] 

result_set  =  [] 

timeout  =  30  #  seconds 

try  : 

1  =  Id ap.  initialize  (uri) 

1  .  simple_bind_s  () 

if  debug:  print  ’’Successfully  bound  to  server. \n” 
if  debug:  print  ’’Searching  for  %s\n”  %  filter 

try  : 

result-id  =  1  .  search  (base_dn  ,  scope  ,  filter  ,  r  e  t  r  i  e  v  e  _a  1 1  r  i  b  ) 

#  Get  all  results  in  one  shot: 

#  re  s  u  1 1  _ty  p  e  ,  re  s  u  1 1  _d  a  t  a  =  l  .  re  suit  (  re  suit -id  ,  1 ,  timeout  ) 

#  Get  results  one  at  a  time: 
while  count  >  0 : 

count  =  count  —  1 

result-type  ,  result-data  =  1.  result  (result-id  ,0, timeout) 
ttprint  result -data 
if(result_data  ==  []): 

break 
else  : 

if  result-type  ==  ldap  . RES_SEARCH_ENTRY : 

result-set .append(result-data) 
else  : 

break 

except  ldap  .  LDAPError  ,  error_message  : 

print  »  sys.stderr,  ”*  LDAP  ERROR:  %s  *”  %  error_message 
except  KeyError  ,  error_message  : 

print  »  sys.stderr  ,  ’’KeyError:  ”,  error_message 

1  .  unbind-S  ()  ; 

except  Id  ap  .  LDAPError  ,  error_message  : 

print  »  sys.stderr  ,  ”*  Couldn’t  connect:  %s  *”  %  error_message 

return  result-set 

#  Return  only  the  full  name  of  each  result  found  on  DoD411 
def  dod4 1 1  Search  (  searc  h-term  ,  count  =  100): 
results  =  [] 

result-set  =  dod4 1 1  Search  All  (  search -term  ,  count) 
if  len(result-set)  ==  0: 

if  debug:  print  ”No  results  for  %s”  %  search-term 

return 

for  result  in  result_set: 

for  name, value  in  result: 
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if  debug:  print  name, value 

if  (  value  .  has.key  (  ’  middleName  ’ ) )  : 

fullname  =  value  [’  givenName  ’]  [0]  +  ”  ”  +  \ 

value  [’ middleName ’][  0 ]  +  ”  ”  +  value  [’ sn ’][ 0 ] 

else  : 

fullname  =  value  [’ givenName ’]  [0]  +  ”  ”  +  value  [’ sn ’]  [0] 
if  debug:  print  fullname 
results  .  append  (  fullname  ) 

return  results 


if  __name__==  ’  __main__  ’  : 

if  ( len ( sy s . argv )  >  1): 

search.term  =  sys.argv[l] 
else  : 

search.term  =  ’’John  Doe” 

if  ( len  f  sy s . argv )  >  2): 

num.results  =  int  (  sys  .  argv  [2] ) 
else  : 

num  .result  s  =  10 

omit.list  =  [’ u  s  erC  e  rti  fi  c  at  e  ;  binary  ’  ] 
i  =  0 

for  result  in  dod4 1 1  Search  All  (  se  arch  .term  ,  num  _re  suit  s  )  : 
i  +=  1 

print  str(i)  +  ’:  ’  +  result  [0] [0] .  split (’,’)  [0] .  split (’  =  ’)[ 1] , 

for  name, value  in  result: 
for  key  in  value : 

if  key  not  in  omit.list: 
print  key , value [ key , 


print 

print 

print 

print  s tr  ( i )  + 

results  found 
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Finding  Uncommon  Names  on  DoD411  Using  Randomized  Com¬ 
bination  (Method  1) 

Listing  6.3:  Finds  uncommon  names  on  DoD41 1  using  Randomized  Combination  (Method  1). 

# 

#  Filename:  methodl.py 

# 

#  Description:  Finds  uncommon  names  on  DoD411  using  Randomized  Combination 

# 

#  Author:  K.  N.  Phillips  ,  April  2010 

from  dod4 1 1  search  import  dod411Search 
from  genNames  import  getName2 
import  sy s 

debug  =  True 

outfile  =  open  (  ’  namelist_method  1  ’  ,  ’a’,  0) 

total.  names_generated  =  0 
total.dod.  names  =  0 
count  =  0 

while  (count  <  1000): 
result  =  None 
while  (result  is  None): 
search.name  =  getName2() 
total,  names-generated  +=  1 
result  =  dod4 1 1  Search  (  search.name  ,  1) 

total_dod_names  +=  1 
name  =  result [0] 

#check  for  duplicates  on  dod411 

dupname  =  name  .  s  p  1  i  t  ()  [0 ]  +  ’  ’  +  name  .  s  p  1  i  t  ()  [  —  1] 
if  debug:  print  ’’Checking  dod  for  duplicates  on  ’’.dupname 
if  len f dod41  lSearch (dupname ,2) )  >  1: 

print  »  sys  .  stderr  ,  ’’****  duplicates  found  for  %s  ****’’  %(dupname) 

continue 

if  debug:  print  ”no  duplicates  found” 

count  =  count  +  1 

print  ”%d :  %s”  %  (count, name) 
o u tfi  1  e  .  write  ( name  +  ’\n’) 

outfile  .  close  () 

print  ’’Total  names  generated:  %d”  %  total_names_generated 
print  ’’Total  name  found  on  DoD411:  %d”  %  total_dod_names 
print  ’’Total  unique  names  found  on  DoD411:  %d”  %  count 
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Finding  Uncommon  Names  on  DoD411  Using  Filtered  Selec¬ 
tion  (Method  2) 

Listing  6.4:  Finds  uncommon  names  on  DoD411  using  Filtered  Selection  (Method  2). 

# 

#  Filename  :  uncommonN ame .  py 

# 

#  Description:  Finds  uncommon  names  on  DoD411  using  Filtered  Selection 

# 

#  Author:  K.  N.  Phillips  ,  February  2010 

import  dod4 1 1  search  ,  random,  os,  time 

global  firstnames  ,  uncommonnames 
first  names  =  set() 
lastnames  =  set  () 
uncommonnames  =  set  () 

outputfilename  =  ’’UncommonFullNames.txt” 
logfilename  =  ’’uncommonnames_log.txt” 
letters  =  set  (  ’  abcdefghijklmnopqrstuvwxyz  ’ ) 
letters_seq  =  1  i  s  t  (’  abcdefghijklmnopqrstuvwxyz  ’ ) 

#  Read  name  files  and  store  in  sets 
def  initializeNames  ()  : 

global  firstnames 

try  : 

firstnamefile  =  open  (’ fi  rs  tn  ame  s  ’  ,  ’r’) 

for  line  in  firstnamefile: 
name  =  line,  split  () 
firstnames  .  add  (  name  [  0  ] ) 
firstnamefile  .  close  () 

lastnamefile  =  open  f  ’  lastnames  ’  ,  ’r’) 
for  line  in  lastnamefile: 
name  =  line,  split  () 
lastnames  .addf  name  [  0  ] ) 
lastnamefile  .  close  () 

uncommonnamefile  =  open  f  outputfilename  ,’r’) 
for  line  in  uncommonnamefile: 
name  =  line,  strip  () 
uncommonnames  .  add  f  name) 
uncommonnamefile  .  close  () 
except  IOError  ,  message  : 
print  message 


#  Check  first  100  names  returned  for  searchstring  on  DoD411  to  see  if  they 
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#  are  uncommon.  If  so,  add  the  name  to  o  utp  utfilename  . 

def  getNames  (  searchstring  =  random  .  sample  (  letters  ,1)[0]  +  ’  ’  +  random, 

sample  (  letters  ,  1  )  [ 0 ] )  : 
count  =  0 

namelist  =  dod4 1 1  search  .  dod4 1 1  Search  (  s e arc h s tri ng  ,100) 
outfile  =  open  (  outputfilename  ,  ’a’  ,0) 
logfile  =  open  ( logfilename  ,  ’a’  ,0) 

logfile  .  write  (”  Searching  for  %s\n”  %  (searchstring)) 
for  name  in  namelist  : 

namel  =  name,  split  () 
first  name  =  namel  [0] 
lastname  =  namel[  —  1] 

if  ((firstname  not  in  firstnames)  or  (lastname  not  in  lastnames)): 
t  =  time  .  time  () 
if  name  not  in  uncommonnames : 

outfile  .  write  (  ”%s\n”  %(name)  ) 

1  o g fi  1  e  .  write  (  ’’Found  %s,%s\n”  %(name  ,  t )  ) 
print  name 

uncommonnames  .  add  ( name ) 
count  =  count  +  1 
else  : 

print  ”****Already  found  %s\n”  %(name) 

1  o  g  fi  1  e  .  write  (  ”****Already  found  %s,%s\n”  %(name  ,  t )  ) 
logfile  .  close  () 
outfile  .  close  () 
return  count 


if  __name__==”  __main__” : 
initializeNames  () 

logfile  =  open  ( logfilename  ,  ’a’  ,0) 
tO  =  time  .  time  () 

logfile.  write  (  ’’Starting  time:  %s\n”  %  (t0)) 
logfile  .  close  () 
count  =  0 

for  letterl  in  letters_seq: 

for  letter2  in  letters_seq: 

searchstring  =  letterl  +  ’  ’  +  1  e  1 1  e  r  2 

getNames  (  searchstring  ) 

logfile  =  open  ( logfilename  ,  ’a’  ,0) 

1  o  g  fi  1  e  .  write  (  ’’Ending  time:  %s\n”  %  ( time  .  time  ()) ) 

1  o  g  fi  1  e  .  write  (  ’’Duration:  %s\n”  %  (time.time()  —  t0)) 
logfile  .  close  () 
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Comparing  the  Three  Methods 

Listing  6.5:  Compares  the  three  methods  for  finding  an  uncommon  name  using  whitepages.com. 

# 

#  Filename  :  compare  .name  ^methods  .  py 

# 

#  Description:  Takes  an  arbitrary  number  of  name  files  and  for  each  name 

#  in  each  file  ,  retrieves  the  number  of  people  in  the  U.S. 

#  with  that  name  from  whitepages.com.  The  results  are 

#  written  to  files  with  the  same  name  as  the  input  files  , 

#  but  with  the  suffix  counts  ”.  Each  input  file  is  expected 

#  to  consist  of  a  list  of  names,  one  per  line  ,  of  the  form 

#  "firstname  [optional  middle  name]  lastname”. 

# 

#  Usage:  getNameCounts  ( file  1  ,  file2  ,  filed  ,  ...) 

# 

#  Author:  K.  N.  Phillips  ,  April  2010 
import  urllib2  ,  sys 

from  BeautifulSoup  import  BeautifulSoup 
def  search (name=”john  doe”): 

’’’’’’Search  whitepages.com  for  name  and  return  the  number  of  people 
with  that  name  in  the  U.S. 

firstname  =  name,  split  ()  [0] 
lastname  =name  .  split  ( )  [  —  1  ] 

base.url  =  ’’http://names.whitepages.com” 
query  =  ”/%s/%s”  %  (firstname  ,  lastname) 

url  =  base_url  +  query 

request  =  urllib2.Request(url) 

try  : 

result  =  urllib2  .  urlopen  (  request ).  read  () 
soup  =  BeautifulSoup  (  result ) 

#  Pull  out  the  number  of  matches 

match.count  =  soup  .  findAll  (  attrs={”id”  :  ”  num  _count  _wi  th  _link  ”}  )  [  0  ] .  a 
.  string  .  split  ()  [0] 

match_count  =  int  (  match.count  .  replace  (’,’,’’) ) 
except  urllib2  .  URLError  ,  error.message  : 

if  ( error.message  .  code  ==  404):  #  no  matches  for  that  name 
match  .count  =  0 
else  : 

print  »  sys  .  stderr  ,  error.message 

return  —1 

print  ”%d  matches  for  %s”  %  ( match.count  ,  (firstname  +  ’  ’  +  lastname)) 
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return  matchcount 


def  readFiles(*file names): 

””” Reads  each  line  of  the  given  filenames  into  a  list  ,  one  list 
fo  r  each  file  . 

result  =  [ ] 

for  filename  in  filenames: 
list  =  [] 

file  =  open  (  filename  r  ’ ) 
for  line  in  file  : 

list. append (line  .  strip  ()) 
result . append  f  list) 
file  .  close  () 

return  result 


def  getNameCounts  (*  filenames  )  : 

’’’’’’Reads  files  containing  a  list  of  names  and  writes  files  out  with 
counts  for  how  many  times  each  name  appears  in  the  U.S.  according  to 
whit ep ages  .  com. 

name_lists  =  readFiles  (*  filenames  ) 
results  =  [] 

for  list  in  name_lists: 
outl  i  s  t  =  [] 
for  name  in  list: 

first  name  =  name  .  s  p  1  i  t  ( )  [  0] 
last  name  =  name  .  s  p  1  i  t  ( )  [  —  1] 
count  =  search  (first  name  +  ’  ’  +  lastname) 

outlist  .  append  (( name  .count ) ) 
results  .  append  (outlist) 

i  =  0 

for  list  in  results: 

filename  =  filenames  [i]  +  ’’.counts” 
file  =  open  ( filename  ,  ’w’  ,  0) 
for  item  in  list: 

file,  write  (item  [0]  +  ’,’  +  str(item[l])  +  ’  \  n  ’ ) 
file  .  close  () 
i  +=  1 


if  __name__==”  __main__”  : 

print  ’’Usage:  getNameCounts  (’ filename  1  ’  ,  ’  filename2  ’  ,  ...  ,  ’filenameN’) 
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Linkedln  Search  Script 

Listing  6.6:  Searches  Linkedln  for  a  name. 

# 

#  Filename:  linkedinsearch  .  py 

# 

#  Description:  Searches  for  Linkedin.com  members  using  Google.  Returns  all 

#  exact  matches  . 

# 

#  Input:  A  string  of  the  form  ’’FirstName  MiddleName  LastName” 

# 

#  Output:  A  tuple  of  the  form  ( numberofmatchesfound  ,  {url:name,  url:name, 

#  ...}) 

# 

#  R  efe  rences  :  http  ://  code  .  google  .  com/  apis  /  ajaxsearch  /  documentation  /#  f  on  j  e 

#  http: //code,  google,  com/  apis  /  ajaxsearch  /  documentation  / 

#  reference  .  html#  -  intro  _/  o n j  e 

# 

#  Author:  K.  N.  Phillips  ,  November  2009 

import  sys  ,  urllib2  ,  re  ,  string  ,  simplejson  ,  time  ,  unicodedata 
debug  =  False 

def  removePunctuation  ( s  = 

’’’’’’Return  string  s  with  all  punctuation  replaced  by  the  empty  string. 
Punctuation  is  defined  as  anything  in  string  .  punctuation .  ””” 
newstring  =  ’  ’ 
for  char  in  s : 

if  char  not  in  string  .  punctuation  : 
newstring  =  newstring  +  char 

return  newstring 

def  search(search_text): 

”””  Searches  for  search-text  on  linkedin.com  using  Google.  Returns  a 
list  of  URLs . 


number_found  =  0 

diet  =  {} 

result-list  =  [] 

query  =  search_text . replace (” 

if  debug:  print  ’’Searching  for  ”,  query 

base_url  =  ’http:/  / ajax,  googleapis.  com  /  ajax  /  services  /  search  /  web  ’ 
search_options  =  ’  ?v=  1 .0& r s z  =  1  arg e&hl  =  en&f  i  1 1  e r  =0  ’ 

linkedin_query  =  ( ’&q=\”  ’  +  query  +  ’  \”+— /updates +—/d i r +—/d i rec  tor y  +  ’ 
’— /grouplnvitation  +  site  :  www .  1  i  n  k  e  d  i  n  .  com  ’ ) 
start_page  =0  #  google  only  returns  the  first  8  results.  Increment 

this  by  8  to  get  the  next  set. 
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search  _url  =  b  a  s  e  _  u  r  1  +  search_options  +  linkedin_query  +  ’ & s  t  a r  t  =  '  + 
str ( start  _p  age ) 

if  debug:  print  ’’Using  url  :  ”,  search_url 
request  =  urllib2  .  Request  (  search_url ) 
has_error  =  True 
while(  has_error)  : 

try  : 

response  =  urllib2  .  urlopen  (  request ) 
json  =  simplej  son  .  loads  (  response  .  read  () ) 
results  =  json  [’ responseData ’][’ results  ’  ] 
has_error  =  False 

except  urllib2  .  URLError  ,  error.message  : 

print  »  sys  .  stderr  ,  error_message  ,  ’’Pausing  3  seconds...” 
time  .  sleep  (3) 

except  TypeError  ,  error_message  : 

print  »  sys  .  stderr  ,  error_message  ,  ’’Pausing  3  seconds...” 
time  .  sleep  (3) 

if  len(results)  ==  0 : 

if  debug:  print  ’No  matches’ 

else  : 

numresults  =  json  [’ responseData ’][’ cursor  ’  ] 
if  debug:  print  ’total  results:  ’  +  str  ( numresults  [  ’ 
estimatedResultCount  ’  ]) 

if  debug:  print  ’  curr  page:  ’  +  s  tr  (  numre  suits  [’ currentPagelndex ’] ) 
current_page  =  numresults  [’ currentPagelndex  ’  ] 
number.found  =  numresults  [’ estimatedResultCount  ’  ] 
for  result  in  results: 

title  =  result  [’titleNoFormatting’].  lower!) 
i f  debug : 

print  ’’Title:  ”,  result  [’  ti  tl  e  ’  ] 

print  ’’Title  noformatting:  ”,  result  [’titleNoFormatting’j 

#  Extract  just  the  name  from  the  title 

title  =  re . sub (r’ ( ( j  r )  | ( sr)  |  ( IV )  |  (  III)  |( II)  |  (  ,.*) ) ? ( \ . ) ?(  -  .*? 

linkedin)|(  —  .  *  ?  \  .  \  .  \  . )’,’’,  title) 
title  =  removePunctuation(  title,  strip!)) 

title  =  unicodedata  .  normalize  ( ’NFKD’  ,  t  i  1 1  e  ).  encode  (’ as  c  ii ’,  ’ 
ignore ’ ) 

#  print  "Title:”,  title 

url  =  r  e  s  u  1 1  [’  url  ’].  lower  ( ) 

if  (title  ==  search  .text  .  lower  ()  )  :  #add  to  returned  results 
if  debug:  print  ’match  found:  ’, 
diet [ url ]  =  title 
result_list  .append (url) 

# els e  : 

#print  »  sys.  stderr,  ’  DOES  NOT  MATCH  \”  ’  +  title  + 

#print  »  sys.  stderr,  "Search  string  used:  ” ,  search  url 
if  debug:  print  title  +  ’:  ’  +  url 

return  result_list  #  return  ( number  .found ,  diet) 
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if  __name__==”  __main__” : 

if  ( len ( sy s . argv )  >  2): 

print  ’’opening  ”  +  s  t r  (  sy s  .  arg v  [ 2 ] )  +  ”  for 
sys . stderr  =  open ( sys . argv [2] ,  ”w” ,  0) 

if  ( len ( sys . argv )  >  1): 

result  =  search ( sys . argv  [  1  ])  ; 

else  : 

result  =  search  (’’John  Smith”) 


for  url  in  result: 

print  url 


stderr 
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Linkedln  Search  Script 

Listing  6.7:  Finds  an  uncommon  name  on  DoD411  using  Randomized  Combination  (Method 
1),  then  searches  for  that  name  on  Linkedln. 

# 

#  Filename:  crossDoD  _linkedin  .  py 

# 

#  Description:  Generates  a  random  name,  attempts  to  find  a  match  for 

#  that  name  on  the  DoD411  LDAP  server,  and  if  found  attempts  # 

to  find  a  match  for  the  name  on  Linkedln 

# 

#  Input:  An  integer  for  the  number  of  names  to  cross  against  Linkedln 

# 

#  Output:  A  list  of  matching  names  and  the  number  of  times  each  appears  in 

#  Linked  in  . 

# 

#  Usage:  python  crossDoD  _FB  .  py  10  outfilename  staff  ilename  e  rro  rfilename 

#  or  python  crossDoD _FB .  py  10  \  tee  —a  outfilename 

#  or  python  crossDoD  FB  .  py  10 

# 

#  Example:  python  crossDoD  _FB .  py  10  results  .  txt  stats,  txt  err.  txt 

# 

#  Author:  K.  N.  Phillips  ,  November  2009 

import  sys  ,  time 
import  linkedin  search 
from  dod411search  import  * 
from  genNames  import  * 

debug  =  False 

***********  Method  Definitions  ************ 

#  Takes  a  string  representing  a  full  name  as  input  ,  searches  Linkedin  for 

#  that  name,  then  the  same  name  but  with  only  the  middle  initial  instead  of 

#  full  middle  name,  then  the  same  name  but  without  the  middle  name.  Prints 

#  the  number  of  matchs  found  on  Linkedin  for  each  of  the  three  versions  of 

#  the  name.  The  output  is  of  the  following  form: 

#  Name,  F ullNameMatchesExact ,  FullNameMatchesTotal  ,  MiddlelnitMatchesExact  , 

MiddlelnitMatchesTotal  ,  NoMiddleNameMatchesExact , 

NoMiddle  Name  Mate  he  sTotal 
def  getLinkedinMatches  (  fullname  )  : 
foundMatch  =  False 
temp  =  fullname  .  s  p  1  i  t  () 
if  len(temp)  ==  3: 

name_nm  =  temp  [0]  +  ”  ”  +  temp  [2]  #remove  middle  name 
name.mi  =  temp  [0]  +  ”  ”  +  temp[l][0]  +  ”  ”  +  temp  [2]  #name  with 
middle  i  n  it  i  a  l 

if  len(temp[l])  ==  1:  #  middle  name  is  only  an  initial 
name_fml  =  None 
else  : 
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name.fml  =  full  name 
elif  lenftemp)  ==  2: 
name.fml  =  None 
name.mi  =  None 
name.nm  =  fullname 
else  : 

print  »  sys  .  stderr  ,  ’’Error  with  name”,  fullname 
return  False 

i f  debug : 

print  ’’Full  first  middle  last:  ”, name.fml 
print  ’’Name  with  middle  init:  ”,name_mi 
print  ’’Name  with  no  middle:  ”  ,name_nm 

print  fullname  , 

#  Get  result  for  full  name 
if  (nameJml  is  not  None): 

linkedin_results  =  linkedinsearch  .  search  ( nameJml ) 
if  (lenflinkedin  .results)  ==  0): 

print  ” , ”  ,0 , 
else  : 

print  ”,”,len(  linkedin.results)  , 
found  Match  =  True 
print  ”  ,  ”  ,len(  linkedin  .results  )  , 
else  : 

print  ” ,0 ,0  , 

#  Get  results  for  name  with  only  middle  initial 
if  (name.mi  is  not  None): 

linkedin.results  =  linkedinsearch  .  search  (name.mi) 
if  (lenflinkedin  .results)  ==  0): 

print  ”  ,  ”  ,0  , 
else  : 

print  lenflinkedin  .results)  , 

found  Match  =  True 
print  ”  ,  ”  ,lenf  linkedin  .results  )  , 
else  : 

print  ” ,0 ,0  , 

#  Get  results  for  name  with  no  middle  name 
if  (name.nm  is  not  None): 

linkedin.results  =  1  inke d in s e arc h  .  se arc h  ( name.nm) 
if  (lenflinkedin  .res  ults)  ==  0): 

print  ”  ,  ”  ,0  , 
else  : 

print  lenflinkedin  .results)  , 

found  Match  =  True 
print  ”  , ”  ,lenf  linkedin.results) 
else  : 

print  ”  ,0  ,0 
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sys  .  stdout  .  flush  () 
return  foundMatch 


^  •I"'!''!''}''}''!''}''}''!''}''}'  S  C  r  l  p  t 

if  ( len ( sys . argv )  >  1): 

count  =  i n t ( sys . argv [ 1 ]) #  number  of  names  to  retrieve  and  test  against 
Linked  in 

else  : 

count  =  1 

if  ( len ( sys  . argv )  >  2): 

fout  =  open ( sys . argv [2] ,  ”a” ,  0)  #open  log  file  for  appending  w/no 
b uffe r i n g 
sys . stdout  =  fout 

if  ( len ( sys . argv )  >  3): 

statout  =  open ( sys . argv [3 ] ,  ”a” ,  0) 
else:  statout  =  sys  .  stdout 

if  ( len ( sys  . argv )  >  4): 

sys .  stderr  =  open ( sys . argv [4] ,  ”a”  ,  0) 


result-list  =  [] 
match -  list  =  [] 
nonmatch-list  =  [] 

total  -  names-generated  =  0 
total_DoD41 1 -matches  =  0 

# initialize N ame s  ( ) 


# print  ’  ’  ’Name,  FullN ameMatchesExact  ,  FullNameMatchesTotal  , 

MiddlelnitMatchesExact  ,  MiddlelnitMatchesTotal  ,  NoMiddleNameMatchesExact , 
NoMiddleNameMatchesTotal\n  ’  ’  ’ 

#  Get  a  random  name,  search  for  it  on  DoD411  Ldap  server  ,  and  then  search  # 
for  the  first  match  found  on  Linkedin  . 
while  ( count  >  0)  : 

search-name  =  getName2() 
total  -  names_generated  +=  1 

result-list  =  dod41  lSearch  (  search_name  ,  1) 

while  (result-list  is  None): 
search-name  =  getName2() 
total  - names_generated  +=  1 

result-list  =  dod41  lSearch  (  search-name  ,  1) 
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for  name  in  result.list: 

dupname  =  name  .  split  ()  [0]  +  ’  ’  +  name  .  s p  1  i t  ()  [  —  1  ]  #check  for 

duplicates  on  dod411 

if  debug:  print  ’’Checking  dod  for  duplicates  on  ”, dupname 
if  len  f  dod4 1 1  Search  f  dupname  , 2 )  )  >  1: 

if  debug:  print  ’’****  duplicates  found  *****” 

continue 

if  debug:  print  ”no  duplicates  found” 

count  =  count  —  1 
total_DoD41  1  .matches  +=  1 
match  =  getLinkedinMatches  (name) 
i  f  ( match )  : 

m  at  c  h  _1  i  s  t  .append  (  name ) 
else  : 

nonmatch.list  .  append) name ) 

time  .  s leep  ( 0 . 5 )  #  sleep  for  .5  seconds  in  between  names  to  be  nicer  to 

#  the  Linkedin  and  DoD411  servers  and  avoid  looking  too 

#  suspicious. 


print  » 
print  » 
print  » 
print  » 
print  » 
print  » 
print  » 


statout 
statout  , 
statout  , 
statout  , 
statout  , 
statout  , 
statout  , 


’’Total  Names  Generated:  ”,  total  .names  .generated 
’’Total  Names  found  on  DoD411:  ”,  total_DoD41  1  .matches 
’’Total  Linkedin  Non— Matches  :  ”,  len  (  nonmatch  .list ) 
’’Total  Linkedin  Matches:  ”,  len  ( match.list ) 

’’Non  matches:  ”,  nonmatch.list 
’’Matches:  ”,  match.list 
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Facebook  Search  Script 

Listing  6.8:  Searches  Facebook  for  a  name. 


# 

#  filename :  fbsearch.py 

# 

#  Description:  Searches  for  Facebook  members  using  the  facebook.com  public 

#  search  page.  Returns  up  to  the  first  10  matches . 

# 

#  Input:  A  string  of  the  form  ’’FirstName  MiddleName  LastName” 

# 

#  Output:  A  tuple  of  the  form  ( numberofmatchesfound  ,  {url:name,  url:name, 

#  ...}) 

# 

#  Author:  K.  N.  Phillips  ,  September  2009 

#  Modified :  K.  N.  Phillips  ,  December  2009  —  updated  output  of  search 

#  function  to  be  just  a  list  of  URLs . 


import  urllib2  ,  re  ,  sys  ,  os  ,  platform  ,  time 
from  BeautifulSoup  import  BeautifulSoup 
debug  =  False 


def  search(search_text): 

”””  Search  facebook  for  search_text  and  return  a  list  of  URLs””” 

newUrl  =  ”” 

security  .check  .number  =  0 
numFound  =  0 
result  =  [ ] 

query  =  search.text . replace (” 

if  debug:  print  ’’Searching  Facebook  for  ”,  query 

facebooksearch.url  =  ”  http  :/ /www.  facebook  .  com/ srch  .  php  ?nm=”  +  query 
request  =  urllib2  .  Request !  facebooksearch.url  ) 


has.error  =  True 
while!  has.error)  : 

try  : 

facebooksearc h_results.html  =  urllib2  ,urlopen(request).read() 
has.error  =  False 

except  urllib2  .  URLError  ,  error.message  : 

print  »  sys  .  stderr  ,  error.message,  ’’Pausing  3  seconds...” 
time  .  sleep  (3 ) 

soup  =  BeautifulSoup  (  facebooksearch_results.html  ) 


#Make  sure  will  wait  if  Security  Check  required  on  Facebook . 
while  ’’Security  Check  Required”  in  soup  .  t  i  1 1  e  .  s  tr  in  g  : 

print  »  sys  .  stderr  ,  ’’Error:  Security  Check  Required  by  Facebook 
raw.input  ( newUrl ) 


has.error  =  True 
while!  has.error  )  : 
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try  : 

facebooksearch_results  _html  =  urllib2  .  urlopen  (  request ).  read 

0 

has_error  =  False 

except  urllib2  .  URLError  ,  error_message  : 

print  »  sys  .  stderr  ,  error_message  ,  ’’Pausing  3  seconds...” 
time  .  sleep  (3) 

soup  =  BeautifulSoup  f  facebooksearch_results_html  ) 


#  Extract  from  HTML  the  number  of  people  found  using  the  follow  in  g  4 

#  cases: 

#  1)  No  summary  information  — >  no  match  found 

#  2)  ” Displaying  the  only  person  that  matches  ” JASON  BLUST” .  ” 

#  3)  ” Displaying  all  10  people  that  match  "PAUL  HEMMER” .  ” 

#  4)  "Displaying  1  —  10  of  43  people  who  match  "SCOTT  ZANE” 
summarytext  =  soup  .  findAll  (  attrs  ={”  class  ”  :  ’’summary”}) 

if  len  (  summarytext )  >  0: 

summarytext  =  summarytext  [0] .  strong  .  string 

#  Case  2,  only  one  match  found 

if  (  summarytext  .  starts  with  (’ Displaying  the  only’)): 
numFound  =  1 

#  Case  3,4 

#  The  number  of  people  is  the  last  number  in  summarytext 
else  : 

numFound  =  re  .  findall  (’ [0—9]+ ’,  summarytext) 
numFound  =  numFound  [  —  1  ] 

i f  debug : 

print  summarytext 

print  ’Number  found:  ’, numFound 

#  Case  1,  no  matches  found  for  that  name 

else  : 

numFound  =  0 

#return  (numFound,  result ) 

return  result 


#  Extract  names  returned  by  the  search  from  the  HTML  page 
for  dd  in  soup . find  All (’ dd ’) : 
result_url  =  dd.af’href’] 
result.name  =  dd. a.  string 

if  result.name  .  lower  ()  ==  search_text  .  lower  ()  : 
if  result_url  in  result: 

print  »  sys  .  stderr  ,  ’’Already  Seen  this  one” 
else  : 

#  result  [  result -url]  =  result  -name 
result . append ( result_url  ) 
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#  return  (numFound,  result) 

return  result 


if  __name__==”  __main__”  : 

if  ( len ( sy s . argv )  >  1): 

result  =  search ( sys . argv [ 1 ]) ; 
else  : 

result  =  search  (’’John  Smith”) 


print  ’’Found”  ,len(  result)  /’matches 
for  url  in  result: 

print  url 
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MySpace  Search  Script 

Listing  6.9:  Searches  MySpace  for  a  name. 

# 

#  Filename  :  my  space  search  .  py 

# 

#  Description:  Searches  for  myspace.com  members  using  the  Myspace  public 

#  search  service  at 

#  http:// search  service,  myspace  .  com/  index,  cfm  ?fuseaction  = 
sitesearch  .  frien  dfi  n  de  r 

# 

# 

#  Input:  Name  to  search  for 

# 

#  Output:  A  list  of  URLs  to  profile  pages  that  match  the  name.  Note:  Only 

#  returns  the  first  10  results 

# 

#  Usage:  my  space  .search  .  search  (’  Nate  Phillips  ’) 

#  or  python  myspace  search  .  py  ’Nate  Phillips  ’ 

#  or  python  my  space  search  .  py  ’myemail@email.com’ 

# 

#  Author:  K.  N.  Phillips  ,  December  2009 

import  urllib2  ,  re  ,  sys  ,  os  ,  platform  ,  time 
from  BeautifulSoup  import  BeautifulSoup 
debug  =  False 

def  search (  name  ): 

”””  Search  Myspace  for  name  or  email  address  and  return  a  list  of  URLs 
to  profile  pages  matching  the  specified  name.  Returns  a  tuple  of  the 
form  (  list  of  urls  ,  total  matches). 

numFound  =  0 
result  =  [ ] 

query  =  name  .  replace  (”  ”,”%20”) 

if  debug:  print  ’’Searching  Myspace  for  ”,  query 

myspace_search_url  =  ’’http  :/  /  searchservice  .  myspace  .  com  /  index  .cfm? 

fuseaction  =  sitesearch  .  resul t s&qry  =” 
myspace_search_options  =  ”&type  =  people&srchBy  =  All” 
search_url  =  myspace_search_url  +  query  +  myspace_search_options 

request  =  urllib2  .  Request  (  search_url  ) 

has_error  =  True 
while  / has_error)  : 

try  : 

s  e  arc  h  _r  e  s  ul  t  s  _h  t  ml  =  urllib2  .  urlopen  (  request ).  read  () 
has.error  =  False 

except  urllib2  .  URLError  ,  error.message  : 

print  »  sys  .  stderr  ,  error_message  ,  ’’Pausing  3  seconds...” 
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time  .  sleep  (3) 


soup  =  B e autiful S oup  (  s  e  arc  h  _r e  s ul  t  s  _h t ml  ) 
i f  debug : 

file  =  open (’ test . html ’w’ ) 
file  .  write  (  soup  .  pr e tti fy  ()  ) 
file  .  close  () 

#  Extract  number  of  results  found  from  the  HTML 

summarytext  =  soup  .  findAll  (  attrs  ={”  class  ”  :  ” displaySummary  ”  }) 

if  len  ( summarytext )  >  0:  #Found  some  results 

summarytext  =  summarytext  [0] .  span  .  nextSibling  #  ’  of  500  results  for’ 
numFound  =  re  .  search  (’  [0  — 9]+  summary  text ) 
i  f  (  numFound )  : 

numFound  =  in t  (numFound  .  group  ()  ) 
else  : 

numFound  =  0 


for  res  in  soup  .  findAll  (  attrs  ={”  class  ”  :  ”  msProfileLink”  })  : 
url  =  res. a[ ’href’] 
result . append ( url ) 

if  debug:  print  ’’Found  %d  total  matches”  %  numFound 
return  (result  .numFound) 

#  ################################################################## 
if  __name__=="  __main__”  : 

if  ( len ( sy s . argv )  >  1): 

result  ,  numFound  =  search ( sys . argv [ 1 ]) ; 
else  : 

result  ,  numFound  =  search  (’’John  Smith”) 


print  ’’Found”  ,  len  (  result)  ,”urls 
for  url  in  result: 

print  url 
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Retrieve  Uncommon  Names  from  DoD411  and  Query  MyS- 
pace 

Listing  6. 10:  Retrieves  uncommon  names  from  DoD41 1  using  Method  1,  then  queries  MySpace 
for  all  three  name  variations  of  each  name. 

# 

#  Filename:  crossDoD _my space  .  py 

# 

#  Description:  Generates  a  random  name,  attempts  to  find  a  match  for 

#  that  name  on  the  DoD411  LDAP  server  ,  and  if  found  attempts 

#  to  find  a  match  for  the  name  on  MySpace . 

# 

#  Input:  An  integer  for  the  number  of  names  to  cross  against  MySpace 

# 

#  Output:  A  list  of  matching  names  and  the  number  of  times  each  appears  in 

#  MySpace . 

# 

#  Usage:  python  crossDoD  FB .  py  10  outfilename  statfilename  e  rro  rfilename 

#  or  python  crossDoD  FB .  py  10  \  tee  —a  outfilename 

#  or  python  crossDoD _FB  .  py  10 

# 

#  Example:  python  crossDoD  _FB .  py  10  results  .  txt  stats  .  txt  err.  txt 

# 

#  Author:  K.  N.  Phillips  ,  December  2009 

import  sys  ,  time 
import  mysp  ace  .search 
from  dod4 1 1  search  import  * 
from  genNames  import  * 

debug  =  False 

***********  Method  Definitions  ************ 

#  Takes  a  string  representing  a  full  name  as  input  ,  searches  MySpace  for 

#  that  name,  then  the  same  name  but  with  only  the  middle  initial  instead  of 

#  full  middle  name,  then  the  same  name  but  without  the  middle  name.  Prints 

#  the  number  of  matchs  found  on  MySpace  for  each  of  the  three  versions  of 

#  the  name.  The  output  is  of  the  following  form: 

#  Name,  Fir stMiddleLastN umberofU RLs  ,  FirstMiddleLastNumberofTotalMatches  , 

FirstMILastNumbe rofURLs  ,  Fir stMILastN umberofT otalMatches  , 

Fi rstLastNumbe rofURLs  ,  F i rstLastN umb erofTotalM at che s 
def  getMyspaceMatches(fullname)  : 
foundMatch  =  False 
temp  =  fullname  .  s  p  1  i  t  () 
if  (len(temp)  ==  3): 

name_nm  =  temp  [0]  +  ”  ”  +  temp  [2]  #remove  middle  name 
name.mi  =  temp[0]  +  ”  ”  +  temp[l][0]  +  ”  ”  +  temp  [2]  #name  with 
middle  i  n  i  t i a l 

if  len(temp[l])  ==  1:  #  middle  name  is  only  an  initial 
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name_fml  =  None 
else  : 

name_fml  =  full  name 
el  if  (len(temp)  ==  2): 
nameJml  =  None 
name_mi  =  None 
name.nm  =  full  name 
else  : 

print  »  sys  .  stderr  ,  ’’Error  with  name”,  fullname 
return  False 

i f  debug : 

print  ’’Full  first  middle  last:  ”,name_fml 
print  ’’Name  with  middle  init:  ”,name_mi 
print  ’’Name  with  no  middle:  ”  ,name_nm 

print  fullname  , 

#  Get  result  for  full  name 
if  (nameJml  is  not  None): 

myspace_urls  ,  myspace_num_matches  =  myspace_search  .  search  f  nameJml) 
if  ( len  (  myspace_urls  )  ==  0): 
print  ”  ,  ”  ,0  , 

else  : 

print  ”  ,  ”  ,  len  ( my space_urls  )  , 
found  Match  =  True 
print  ”  ,  myspace_num_matches  , 

else  : 

print  ” ,0 ,0  , 

#  Get  results  for  name  with  only  middle  initial 
if  (name.mi  is  not  None): 

myspace_urls  ,  myspace_num_matches  =  myspace_search  .  search  f  name_mi) 
if  ( len  (  myspace_urls  )  ==  0): 
print  ”  ,  ”  ,0  , 

else  : 

print  ”  ,  ”  ,  len  ( my space_urls  )  , 
foundMatch  =  True 
print  ”  ,  myspace_num_matches  , 

else  : 

print  ” ,0 ,0  , 

#  Get  results  for  name  with  no  middle  name 
if  (name_nm  is  not  None): 

myspace_urls  ,  myspace_num_matches  =  myspace_search  .  search  f  name_nm) 
if  ( len  (  myspace_urls  )  ==  0): 
print  ”  ,  ”  ,0  , 

else  : 

print  ”  ,  ”  ,  len  (  myspace_urls  )  , 
foundMatch  =  True 
print  ”  , ”  ,  myspace_num_matches 

else  : 
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print  ”  ,0  ,0 

sys  .  stdout  .  flush  () 
return  foundMatch 


^  •I"'}''}''}''}''}''!''!''}''}''}'  S  c  r  i  p  t 

if  ( len ( sys . argv )  >  1): 

count  =  in t  (  sy s  .  argv  [  1  ] ) #  number  of  names  to  retrieve  and  test  against 
My  space 

else  : 

count  =  1 

if  ( len ( sys . argv )  >  2): 

fout  =  open ( sys . argv [2] ,  ”a” ,  0)  #open  log  file  for  appending  w/no 
b uffe r i n g 
sys . stdout  =  fout 

if  ( len ( sys . argv )  >  3): 

statout  =  open ( sys . argv [3 ] ,  ”a” ,  0) 
else:  statout  =  sys  .  stdout 


if  ( len ( sys . argv )  >  4): 

sys.stderr  =  open f sys . argv [4] ,  ”a” ,  0) 

result-list  =  [] 
match -  list  =  [] 
nonmatch-list  =  [] 

total  - names_generated  =  0 
total_DoD41 1 -matches  =  0 

#  ini  t  ializeN  ante  s  (  ) 

#print  ’  ’  ’Name,  FirstMiddleLastNumberofURLs  , 

Fi rstM iddleLas tNumb e rofTotalMatche s  ,  Fi rstMILastN umbe rofURLs  , 

Fi  rs  tM  I  Las  tN  umbe  rofTotalMatches  ,  FirstLastNumberofU  RLs  , 

F i r stLastN umb erofTotalM at che s\n  ’  ’  ’ 

#  Get  a  random  name,  search  for  it  on  DoD411  Ldap  server,  and  then  search 

#  Myspace  for  the  first  match  found. 
while  ( count  >  0)  : 

result-list  =  None 
while  (result-list  is  None): 
search-name  =  getName2() 
total  - names-generated  +=  1 

result-list  =  dod41  lSearch  (  search-name  ,  1) 

for  name  in  result-list: 
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dupname  =  name .  s p  1  i t ()  [ 0]  +  ’  ’  +  name .  s p  1  i t () [  —  1  ]  #check  for 

duplicates  on  dod411 

if  debug:  print  ’’Checking  dod  for  duplicates  on  ”, dupname 
if  len  (  dod4 1 1  Search ( dupname , 2 ) )  >  1: 

print  »  sys  .  stderr  ,  ’’****  duplicates  found  for  %s  ****’’  %( 
dupname ) 

continue 

if  debug:  print  ”no  duplicates  found” 


count  =  count  —  1 
total_DoD41  1  .matches  +=  1 
match  =  getMy spaceMatches  ( name ) 
i  f  ( match )  : 

m  at  c  h  _1  i  s  t  .append  (  name ) 

else  : 

nonmatch.list  .  append! name ) 

time  .  s leep  ( 0 . 5 )  #  sleep  for  .5  seconds  in  between  names  to  be  nicer  to 

#  the  Myspace  and  DoD411  servers  and  avoid  looking  too 

#  suspicious. 


print  » 
print  » 
print  » 
print  » 
print  » 
print  » 
print  » 


statout 
statout  , 
statout  , 
statout  , 
statout  , 
statout  , 
statout  , 


’’Total  Names  Generated:  ”,  total  .names -generated 
’’Total  Names  found  on  DoD411:  ”,  total_DoD41  1  .matches 
’’Total  Myspace  Non— Matches  :  ”,  len  (  nonmatch  .list ) 
’’Total  Myspace  Matches:  ”,  len  ( match.list ) 

’’Non  matches:  ”,  nonmatch  .list 
’’Matches:  ”,  match.list 


100 


Initial  Distribution  List 


1 .  Defense  Technical  Information  Center 
Ft.  Belvoir,  Virginia 

2.  Dudley  Knox  Library 
Naval  Postgraduate  School 
Monterey,  California 

3.  Marine  Corps  Representative 
Naval  Postgraduate  School 
Monterey,  California 

4.  Director,  Training  and  Education,  MCCDC,  Code  C46 
Quantico,  Virginia 

5.  Director,  Marine  Corps  Research  Center,  MCCDC,  Code  C40RC 
Quantico,  Virginia 

6.  Marine  Corps  Tactical  System  Support  Activity  (Attn:  Operations  Officer) 
Camp  Pendleton,  California 
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